评价此页

使用 Ray Tune 进行超参数调优#

创建日期:2020 年 8 月 31 日 | 最后更新:2025 年 6 月 24 日 | 最后验证:2024 年 11 月 5 日

超参数调优可以区分出普通模型和高度准确的模型。通常,选择不同的学习率或更改网络层大小等简单操作会极大地影响模型的性能。

幸运的是,有一些工具可以帮助找到最佳参数组合。 Ray Tune 是分布式超参数调优的行业标准工具。Ray Tune 包含最新的超参数搜索算法,与各种分析库集成,并通过 Ray 的分布式机器学习引擎 原生支持分布式训练。

在本教程中,我们将向您展示如何将 Ray Tune 集成到您的 PyTorch 训练工作流中。我们将扩展 PyTorch 文档中的本教程,用于训练 CIFAR10 图像分类器。

如您所见,我们只需要进行一些小的修改。特别是,我们需要

  1. 将数据加载和训练包装在函数中,

  2. 使某些网络参数可配置,

  3. 添加检查点(可选),

  4. 并定义模型调优的搜索空间


要运行本教程,请确保已安装以下软件包

  • ray[tune]:分布式超参数调优库

  • torchvision:用于数据转换器

设置 / 导入#

让我们从导入开始

from functools import partial
import os
import tempfile
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray import train
from ray.train import Checkpoint, get_checkpoint
from ray.tune.schedulers import ASHAScheduler
import ray.cloudpickle as pickle

大多数导入都用于构建 PyTorch 模型。只有最后的导入是用于 Ray Tune 的。

数据加载器#

我们将数据加载器包装在自己的函数中,并传递一个全局数据目录。这样,我们就可以在不同的试验之间共享一个数据目录。

def load_data(data_dir="./data"):
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )

    trainset = torchvision.datasets.CIFAR10(
        root=data_dir, train=True, download=True, transform=transform
    )

    testset = torchvision.datasets.CIFAR10(
        root=data_dir, train=False, download=True, transform=transform
    )

    return trainset, testset

可配置的神经网络#

我们只能调优可配置的参数。在此示例中,我们可以指定全连接层的层大小

class Net(nn.Module):
    def __init__(self, l1=120, l2=84):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, l1)
        self.fc2 = nn.Linear(l1, l2)
        self.fc3 = nn.Linear(l2, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

训练函数#

现在变得有趣了,因为我们对 PyTorch 文档中的示例 进行了一些更改。

我们将训练脚本包装在一个名为 train_cifar(config, data_dir=None) 的函数中。 config 参数将接收我们想要训练的超参数。 data_dir 指定加载和存储数据的目录,以便多个运行可以共享相同的数据源。我们还在运行时开始时加载模型和优化器状态,如果提供了检查点。在本教程的更下方,您将找到有关如何保存检查点及其用途的信息。

net = Net(config["l1"], config["l2"])

checkpoint = get_checkpoint()
if checkpoint:
    with checkpoint.as_directory() as checkpoint_dir:
        data_path = Path(checkpoint_dir) / "data.pkl"
        with open(data_path, "rb") as fp:
            checkpoint_state = pickle.load(fp)
        start_epoch = checkpoint_state["epoch"]
        net.load_state_dict(checkpoint_state["net_state_dict"])
        optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
else:
    start_epoch = 0

优化器的学习率也配置为可配置的

optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

我们还将训练数据分成训练集和验证集。因此,我们在 80% 的数据上进行训练,并在其余 20% 的数据上计算验证损失。迭代训练集和测试集的批量大小也是可配置的。

使用 DataParallel 添加(多)GPU 支持#

图像分类很大程度上受益于 GPU。幸运的是,我们可以在 Ray Tune 中继续使用 PyTorch 的抽象。因此,我们可以将模型包装在 nn.DataParallel 中,以支持多 GPU 上的数据并行训练

device = "cpu"
if torch.cuda.is_available():
    device = "cuda:0"
    if torch.cuda.device_count() > 1:
        net = nn.DataParallel(net)
net.to(device)

通过使用 device 变量,我们确保即使在没有 GPU 可用时,训练也能正常进行。PyTorch 要求我们显式地将数据发送到 GPU 内存,如下所示

for i, data in enumerate(trainloader, 0):
    inputs, labels = data
    inputs, labels = inputs.to(device), labels.to(device)

该代码现在支持在 CPU、单 GPU 和多 GPU 上进行训练。值得注意的是,Ray 还支持 分数 GPU,因此我们可以与试验共享 GPU,只要模型仍然适合 GPU 内存。稍后我们将回到这一点。

与 Ray Tune 通信#

最有趣的部分是与 Ray Tune 的通信

checkpoint_data = {
    "epoch": epoch,
    "net_state_dict": net.state_dict(),
    "optimizer_state_dict": optimizer.state_dict(),
}
with tempfile.TemporaryDirectory() as checkpoint_dir:
    data_path = Path(checkpoint_dir) / "data.pkl"
    with open(data_path, "wb") as fp:
        pickle.dump(checkpoint_data, fp)

    checkpoint = Checkpoint.from_directory(checkpoint_dir)
    train.report(
        {"loss": val_loss / val_steps, "accuracy": correct / total},
        checkpoint=checkpoint,
    )

在这里,我们首先保存一个检查点,然后将一些指标报告回 Ray Tune。具体来说,我们将验证损失和准确率发送回 Ray Tune。Ray Tune 然后可以使用这些指标来决定哪种超参数配置带来了最佳结果。这些指标还可以用于提前停止表现不佳的试验,以避免在这些试验上浪费资源。

检查点保存是可选的,但如果我们想使用如 Population Based Training 等高级调度程序,则它是必需的。此外,通过保存检查点,我们以后可以加载训练好的模型并在测试集上进行验证。最后,保存检查点对于容错很有用,它允许我们中断训练并稍后继续训练。

完整的训练函数#

完整的代码示例如下所示

def train_cifar(config, data_dir=None):
    net = Net(config["l1"], config["l2"])

    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if torch.cuda.device_count() > 1:
            net = nn.DataParallel(net)
    net.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    checkpoint = get_checkpoint()
    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "rb") as fp:
                checkpoint_state = pickle.load(fp)
            start_epoch = checkpoint_state["epoch"]
            net.load_state_dict(checkpoint_state["net_state_dict"])
            optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
    else:
        start_epoch = 0

    trainset, testset = load_data(data_dir)

    test_abs = int(len(trainset) * 0.8)
    train_subset, val_subset = random_split(
        trainset, [test_abs, len(trainset) - test_abs]
    )

    trainloader = torch.utils.data.DataLoader(
        train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )
    valloader = torch.utils.data.DataLoader(
        val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )

    for epoch in range(start_epoch, 10):  # loop over the dataset multiple times
        running_loss = 0.0
        epoch_steps = 0
        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            epoch_steps += 1
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(
                    "[%d, %5d] loss: %.3f"
                    % (epoch + 1, i + 1, running_loss / epoch_steps)
                )
                running_loss = 0.0

        # Validation loss
        val_loss = 0.0
        val_steps = 0
        total = 0
        correct = 0
        for i, data in enumerate(valloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)

                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1

        checkpoint_data = {
            "epoch": epoch,
            "net_state_dict": net.state_dict(),
            "optimizer_state_dict": optimizer.state_dict(),
        }
        with tempfile.TemporaryDirectory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "wb") as fp:
                pickle.dump(checkpoint_data, fp)

            checkpoint = Checkpoint.from_directory(checkpoint_dir)
            train.report(
                {"loss": val_loss / val_steps, "accuracy": correct / total},
                checkpoint=checkpoint,
            )

    print("Finished Training")

如您所见,大部分代码直接改编自原始示例。

测试集准确率#

通常,机器学习模型的性能是在一个独立的测试集上测试的,该测试集包含未用于训练模型的数据。我们也将其包装在一个函数中

def test_accuracy(net, device="cpu"):
    trainset, testset = load_data()

    testloader = torch.utils.data.DataLoader(
        testset, batch_size=4, shuffle=False, num_workers=2
    )

    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    return correct / total

该函数还期望一个 device 参数,因此我们可以在 GPU 上进行测试集验证。

配置搜索空间#

最后,我们需要定义 Ray Tune 的搜索空间。这是一个例子

config = {
    "l1": tune.choice([2 ** i for i in range(9)]),
    "l2": tune.choice([2 ** i for i in range(9)]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([2, 4, 8, 16])
}

tune.choice() 接受一个从中均匀采样值的列表。在此示例中,l1l2 参数应该是 4 到 256 之间的 2 的幂,即 4、8、16、32、64、128 或 256。 lr(学习率)应该在 0.0001 和 0.1 之间均匀采样。最后,批量大小是在 2、4、8 和 16 之间选择。

在每次试验中,Ray Tune 现在将从这些搜索空间中随机采样参数组合。然后,它将并行训练多个模型,并从中找到性能最佳的模型。我们还使用 ASHAScheduler,它将提前终止表现不佳的试验。

我们使用 functools.partial 包装 train_cifar 函数,以设置常量 data_dir 参数。我们还可以告诉 Ray Tune 每项试验应提供哪些资源

gpus_per_trial = 2
# ...
result = tune.run(
    partial(train_cifar, data_dir=data_dir),
    resources_per_trial={"cpu": 8, "gpu": gpus_per_trial},
    config=config,
    num_samples=num_samples,
    scheduler=scheduler,
    checkpoint_at_end=True)

您可以指定 CPU 的数量,然后可以使用这些 CPU 来增加 PyTorch DataLoader 实例的 num_workers。所选的 GPU 数量在每个试验中对 PyTorch 可见。试验无法访问未为其请求的 GPU — 因此您不必担心两个试验使用相同的资源集。

在这里,我们还可以指定分数 GPU,因此像 gpus_per_trial=0.5 这样的参数是完全有效的。然后,试验将与其他试验共享 GPU。您只需要确保模型仍然适合 GPU 内存。

训练完模型后,我们将找到性能最佳的模型,并从检查点文件中加载训练好的网络。然后,我们获得测试集准确率,并通过打印报告所有内容。

完整的 main 函数如下所示

def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
    data_dir = os.path.abspath("./data")
    load_data(data_dir)
    config = {
        "l1": tune.choice([2**i for i in range(9)]),
        "l2": tune.choice([2**i for i in range(9)]),
        "lr": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([2, 4, 8, 16]),
    }
    scheduler = ASHAScheduler(
        metric="loss",
        mode="min",
        max_t=max_num_epochs,
        grace_period=1,
        reduction_factor=2,
    )
    result = tune.run(
        partial(train_cifar, data_dir=data_dir),
        resources_per_trial={"cpu": 2, "gpu": gpus_per_trial},
        config=config,
        num_samples=num_samples,
        scheduler=scheduler,
    )

    best_trial = result.get_best_trial("loss", "min", "last")
    print(f"Best trial config: {best_trial.config}")
    print(f"Best trial final validation loss: {best_trial.last_result['loss']}")
    print(f"Best trial final validation accuracy: {best_trial.last_result['accuracy']}")

    best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"])
    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if gpus_per_trial > 1:
            best_trained_model = nn.DataParallel(best_trained_model)
    best_trained_model.to(device)

    best_checkpoint = result.get_best_checkpoint(trial=best_trial, metric="accuracy", mode="max")
    with best_checkpoint.as_directory() as checkpoint_dir:
        data_path = Path(checkpoint_dir) / "data.pkl"
        with open(data_path, "rb") as fp:
            best_checkpoint_data = pickle.load(fp)

        best_trained_model.load_state_dict(best_checkpoint_data["net_state_dict"])
        test_acc = test_accuracy(best_trained_model, device)
        print("Best trial test set accuracy: {}".format(test_acc))


if __name__ == "__main__":
    # You can change the number of GPUs per trial here:
    main(num_samples=10, max_num_epochs=10, gpus_per_trial=0)
  0%|          | 0.00/170M [00:00<?, ?B/s]
  0%|          | 459k/170M [00:00<00:37, 4.53MB/s]
  5%|▍         | 7.86M/170M [00:00<00:03, 45.2MB/s]
 11%|█▏        | 19.4M/170M [00:00<00:01, 77.0MB/s]
 18%|█▊        | 31.0M/170M [00:00<00:01, 92.4MB/s]
 25%|██▍       | 42.5M/170M [00:00<00:01, 100MB/s]
 32%|███▏      | 54.1M/170M [00:00<00:01, 106MB/s]
 38%|███▊      | 64.7M/170M [00:00<00:01, 103MB/s]
 44%|████▍     | 75.0M/170M [00:00<00:00, 98.7MB/s]
 51%|█████     | 86.3M/170M [00:00<00:00, 103MB/s]
 57%|█████▋    | 97.9M/170M [00:01<00:00, 107MB/s]
 64%|██████▍   | 110M/170M [00:01<00:00, 110MB/s]
 71%|███████   | 121M/170M [00:01<00:00, 112MB/s]
 78%|███████▊  | 133M/170M [00:01<00:00, 113MB/s]
 85%|████████▍ | 145M/170M [00:01<00:00, 114MB/s]
 92%|█████████▏| 156M/170M [00:01<00:00, 115MB/s]
 98%|█████████▊| 168M/170M [00:01<00:00, 115MB/s]
100%|██████████| 170M/170M [00:01<00:00, 103MB/s]
2025-10-15 19:11:01,701 WARNING services.py:1889 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147467264 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2025-10-15 19:11:01,865 INFO worker.py:1642 -- Started a local Ray instance.
2025-10-15 19:11:02,776 INFO tune.py:228 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run(...)`.
2025-10-15 19:11:02,778 INFO tune.py:654 -- [output] This will use the new output engine with verbosity 2. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
╭────────────────────────────────────────────────────────────────────╮
│ Configuration for experiment     train_cifar_2025-10-15_19-11-02   │
├────────────────────────────────────────────────────────────────────┤
│ Search algorithm                 BasicVariantGenerator             │
│ Scheduler                        AsyncHyperBandScheduler           │
│ Number of trials                 10                                │
╰────────────────────────────────────────────────────────────────────╯

View detailed results here: /var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02
To visualize your results with TensorBoard, run: `tensorboard --logdir /var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02`

Trial status: 10 PENDING
Current time: 2025-10-15 19:11:03. Total running time: 0s
Logical resource usage: 14.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭───────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status       l1     l2            lr     batch_size │
├───────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00000   PENDING       1      1   0.00255541               2 │
│ train_cifar_b12d1_00001   PENDING     256      8   0.00137364               4 │
│ train_cifar_b12d1_00002   PENDING       8      2   0.0465214                4 │
│ train_cifar_b12d1_00003   PENDING      16     16   0.000173963              2 │
│ train_cifar_b12d1_00004   PENDING       4      8   0.0498037               16 │
│ train_cifar_b12d1_00005   PENDING      16      1   0.002926                 8 │
│ train_cifar_b12d1_00006   PENDING       2     32   0.0314836                8 │
│ train_cifar_b12d1_00007   PENDING       2      8   0.000201703              8 │
│ train_cifar_b12d1_00008   PENDING       1     32   0.0132428               16 │
│ train_cifar_b12d1_00009   PENDING      16    128   0.0250987                4 │
╰───────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00001 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 config             │
├──────────────────────────────────────────────────┤
│ batch_size                                     4 │
│ l1                                           256 │
│ l2                                             8 │
│ lr                                       0.00137 │
╰──────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00006 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00006 config             │
├──────────────────────────────────────────────────┤
│ batch_size                                     8 │
│ l1                                             2 │
│ l2                                            32 │
│ lr                                       0.03148 │
╰──────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00002 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00002 config             │
├──────────────────────────────────────────────────┤
│ batch_size                                     4 │
│ l1                                             8 │
│ l2                                             2 │
│ lr                                       0.04652 │
╰──────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00005 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 config             │
├──────────────────────────────────────────────────┤
│ batch_size                                     8 │
│ l1                                            16 │
│ l2                                             1 │
│ lr                                       0.00293 │
╰──────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00003 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 config             │
├──────────────────────────────────────────────────┤
│ batch_size                                     2 │
│ l1                                            16 │
│ l2                                            16 │
│ lr                                       0.00017 │
╰──────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00000 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00000 config             │
├──────────────────────────────────────────────────┤
│ batch_size                                     2 │
│ l1                                             1 │
│ l2                                             1 │
│ lr                                       0.00256 │
╰──────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00004 started with configuration:
╭─────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 config            │
├─────────────────────────────────────────────────┤
│ batch_size                                   16 │
│ l1                                            4 │
│ l2                                            8 │
│ lr                                       0.0498 │
╰─────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00007 started with configuration:
╭─────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00007 config            │
├─────────────────────────────────────────────────┤
│ batch_size                                    8 │
│ l1                                            2 │
│ l2                                            8 │
│ lr                                       0.0002 │
╰─────────────────────────────────────────────────╯
(func pid=3985) [1,  2000] loss: 2.309
(func pid=3985) [1,  4000] loss: 1.142 [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.rayai.org.cn/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)

Trial train_cifar_b12d1_00004 finished iteration 1 at 2025-10-15 19:11:33. Total running time: 30s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                  25.49506 │
│ time_total_s                                      25.49506 │
│ training_iteration                                       1 │
│ accuracy                                            0.1386 │
│ loss                                               2.24982 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000000
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000000)

Trial status: 8 RUNNING | 2 PENDING
Current time: 2025-10-15 19:11:33. Total running time: 30s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00000   RUNNING       1      1   0.00255541               2                                                    │
│ train_cifar_b12d1_00001   RUNNING     256      8   0.00137364               4                                                    │
│ train_cifar_b12d1_00002   RUNNING       8      2   0.0465214                4                                                    │
│ train_cifar_b12d1_00003   RUNNING      16     16   0.000173963              2                                                    │
│ train_cifar_b12d1_00004   RUNNING       4      8   0.0498037               16        1            25.4951   2.24982       0.1386 │
│ train_cifar_b12d1_00005   RUNNING      16      1   0.002926                 8                                                    │
│ train_cifar_b12d1_00006   RUNNING       2     32   0.0314836                8                                                    │
│ train_cifar_b12d1_00007   RUNNING       2      8   0.000201703              8                                                    │
│ train_cifar_b12d1_00008   PENDING       1     32   0.0132428               16                                                    │
│ train_cifar_b12d1_00009   PENDING      16    128   0.0250987                4                                                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) [1,  6000] loss: 0.739 [repeated 7x across cluster]

Trial train_cifar_b12d1_00005 finished iteration 1 at 2025-10-15 19:11:49. Total running time: 46s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                  41.73979 │
│ time_total_s                                      41.73979 │
│ training_iteration                                       1 │
│ accuracy                                            0.1744 │
│ loss                                                1.9554 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000000
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000000)

Trial train_cifar_b12d1_00007 finished iteration 1 at 2025-10-15 19:11:49. Total running time: 46s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00007 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                  42.02243 │
│ time_total_s                                      42.02243 │
│ training_iteration                                       1 │
│ accuracy                                            0.1687 │
│ loss                                               2.26399 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00007 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00007_7_batch_size=8,l1=2,l2=8,lr=0.0002_2025-10-15_19-11-02/checkpoint_000000

Trial train_cifar_b12d1_00007 completed after 1 iterations at 2025-10-15 19:11:49. Total running time: 46s
(func pid=3989) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00007_7_batch_size=8,l1=2,l2=8,lr=0.0002_2025-10-15_19-11-02/checkpoint_000000)

Trial train_cifar_b12d1_00008 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00008 config             │
├──────────────────────────────────────────────────┤
│ batch_size                                    16 │
│ l1                                             1 │
│ l2                                            32 │
│ lr                                       0.01324 │
╰──────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00006 finished iteration 1 at 2025-10-15 19:11:49. Total running time: 46s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00006 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                  42.43598 │
│ time_total_s                                      42.43598 │
│ training_iteration                                       1 │
│ accuracy                                            0.1004 │
│ loss                                                2.3102 │
╰────────────────────────────────────────────────────────────╯(func pid=3986) [2,  2000] loss: 2.252 [repeated 4x across cluster]

Trial train_cifar_b12d1_00006 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00006_6_batch_size=8,l1=2,l2=32,lr=0.0315_2025-10-15_19-11-02/checkpoint_000000

Trial train_cifar_b12d1_00006 completed after 1 iterations at 2025-10-15 19:11:49. Total running time: 46s

Trial train_cifar_b12d1_00009 started with configuration:
╭─────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00009 config            │
├─────────────────────────────────────────────────┤
│ batch_size                                    4 │
│ l1                                           16 │
│ l2                                          128 │
│ lr                                       0.0251 │
╰─────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00004 finished iteration 2 at 2025-10-15 19:11:55. Total running time: 53s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000001 │
│ time_this_iter_s                                  22.83439 │
│ time_total_s                                      48.32945 │
│ training_iteration                                       2 │
│ accuracy                                            0.1362 │
│ loss                                               2.25198 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000001
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000001) [repeated 2x across cluster]
(func pid=3983) [1,  8000] loss: 0.410 [repeated 4x across cluster]
(func pid=3985) [1, 10000] loss: 0.399

Trial status: 8 RUNNING | 2 TERMINATED
Current time: 2025-10-15 19:12:03. Total running time: 1min 0s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00000   RUNNING         1      1   0.00255541               2                                                    │
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4                                                    │
│ train_cifar_b12d1_00002   RUNNING         8      2   0.0465214                4                                                    │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2                                                    │
│ train_cifar_b12d1_00004   RUNNING         4      8   0.0498037               16        2            48.3295   2.25198       0.1362 │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        1            41.7398   1.9554        0.1744 │
│ train_cifar_b12d1_00008   RUNNING         1     32   0.0132428               16                                                    │
│ train_cifar_b12d1_00009   RUNNING        16    128   0.0250987                4                                                    │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3987) [2,  2000] loss: 1.948
(func pid=3986) [3,  2000] loss: 2.255 [repeated 6x across cluster]

Trial train_cifar_b12d1_00008 finished iteration 1 at 2025-10-15 19:12:13. Total running time: 1min 11s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00008 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                   24.3124 │
│ time_total_s                                       24.3124 │
│ training_iteration                                       1 │
│ accuracy                                            0.1027 │
│ loss                                               2.30466 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00008 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00008_8_batch_size=16,l1=1,l2=32,lr=0.0132_2025-10-15_19-11-02/checkpoint_000000

Trial train_cifar_b12d1_00008 completed after 1 iterations at 2025-10-15 19:12:13. Total running time: 1min 11s
(func pid=3989) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00008_8_batch_size=16,l1=1,l2=32,lr=0.0132_2025-10-15_19-11-02/checkpoint_000000)

Trial train_cifar_b12d1_00004 finished iteration 3 at 2025-10-15 19:12:18. Total running time: 1min 15s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000002 │
│ time_this_iter_s                                  22.12243 │
│ time_total_s                                      70.45188 │
│ training_iteration                                       3 │
│ accuracy                                            0.1349 │
│ loss                                               2.25246 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000002
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000002)

Trial train_cifar_b12d1_00002 finished iteration 1 at 2025-10-15 19:12:18. Total running time: 1min 15s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00002 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                  71.25909 │
│ time_total_s                                      71.25909 │
│ training_iteration                                       1 │
│ accuracy                                            0.1011 │
│ loss                                               2.35082 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00002 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00002_2_batch_size=4,l1=8,l2=2,lr=0.0465_2025-10-15_19-11-02/checkpoint_000000

Trial train_cifar_b12d1_00002 completed after 1 iterations at 2025-10-15 19:12:18. Total running time: 1min 15s

Trial train_cifar_b12d1_00001 finished iteration 1 at 2025-10-15 19:12:20. Total running time: 1min 18s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                  73.62148 │
│ time_total_s                                      73.62148 │
│ training_iteration                                       1 │
│ accuracy                                            0.4425 │
│ loss                                               1.54805 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000000
(func pid=3985) [1, 14000] loss: 0.261 [repeated 5x across cluster]

Trial train_cifar_b12d1_00005 finished iteration 2 at 2025-10-15 19:12:25. Total running time: 1min 22s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000001 │
│ time_this_iter_s                                  36.40886 │
│ time_total_s                                      78.14865 │
│ training_iteration                                       2 │
│ accuracy                                            0.2135 │
│ loss                                               1.87288 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000001
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000001) [repeated 3x across cluster]
(func pid=3985) [1, 16000] loss: 0.222 [repeated 3x across cluster]

Trial status: 6 RUNNING | 4 TERMINATED
Current time: 2025-10-15 19:12:33. Total running time: 1min 30s
Logical resource usage: 12.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00000   RUNNING         1      1   0.00255541               2                                                    │
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        1            73.6215   1.54805       0.4425 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2                                                    │
│ train_cifar_b12d1_00004   RUNNING         4      8   0.0498037               16        3            70.4519   2.25246       0.1349 │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        2            78.1487   1.87288       0.2135 │
│ train_cifar_b12d1_00009   RUNNING        16    128   0.0250987                4                                                    │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00004 finished iteration 4 at 2025-10-15 19:12:35. Total running time: 1min 32s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000003 │
│ time_this_iter_s                                  17.38521 │
│ time_total_s                                      87.83709 │
│ training_iteration                                       4 │
│ accuracy                                            0.1386 │
│ loss                                               2.25213 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000003
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000003)
(func pid=3987) [3,  2000] loss: 1.877 [repeated 5x across cluster]
(func pid=3985) [1, 20000] loss: 0.169 [repeated 5x across cluster]

Trial train_cifar_b12d1_00009 finished iteration 1 at 2025-10-15 19:12:51. Total running time: 1min 48s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00009 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                  61.69678 │
│ time_total_s                                      61.69678 │
│ training_iteration                                       1 │
│ accuracy                                            0.1027 │
│ loss                                               2.32275 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00009 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00009_9_batch_size=4,l1=16,l2=128,lr=0.0251_2025-10-15_19-11-02/checkpoint_000000

Trial train_cifar_b12d1_00009 completed after 1 iterations at 2025-10-15 19:12:51. Total running time: 1min 48s
(func pid=3988) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00009_9_batch_size=4,l1=16,l2=128,lr=0.0251_2025-10-15_19-11-02/checkpoint_000000)

Trial train_cifar_b12d1_00004 finished iteration 5 at 2025-10-15 19:12:52. Total running time: 1min 49s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000004 │
│ time_this_iter_s                                   17.2953 │
│ time_total_s                                     105.13238 │
│ training_iteration                                       5 │
│ accuracy                                            0.1349 │
│ loss                                               2.25508 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000004

Trial train_cifar_b12d1_00005 finished iteration 3 at 2025-10-15 19:12:54. Total running time: 1min 51s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000002 │
│ time_this_iter_s                                  29.15698 │
│ time_total_s                                     107.30563 │
│ training_iteration                                       3 │
│ accuracy                                            0.2519 │
│ loss                                               1.83052 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000002
(func pid=3983) [2,  8000] loss: 0.345 [repeated 5x across cluster]

Trial train_cifar_b12d1_00003 finished iteration 1 at 2025-10-15 19:13:01. Total running time: 1min 58s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                   113.724 │
│ time_total_s                                       113.724 │
│ training_iteration                                       1 │
│ accuracy                                            0.3598 │
│ loss                                               1.68835 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000000
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000000) [repeated 3x across cluster]

Trial train_cifar_b12d1_00000 finished iteration 1 at 2025-10-15 19:13:01. Total running time: 1min 58s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00000 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000000 │
│ time_this_iter_s                                 114.23864 │
│ time_total_s                                     114.23864 │
│ training_iteration                                       1 │
│ accuracy                                            0.1023 │
│ loss                                               2.30465 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00000 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00000_0_batch_size=2,l1=1,l2=1,lr=0.0026_2025-10-15_19-11-02/checkpoint_000000

Trial train_cifar_b12d1_00000 completed after 1 iterations at 2025-10-15 19:13:01. Total running time: 1min 58s

Trial status: 6 TERMINATED | 4 RUNNING
Current time: 2025-10-15 19:13:03. Total running time: 2min 0s
Logical resource usage: 8.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        1            73.6215   1.54805       0.4425 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        1           113.724    1.68835       0.3598 │
│ train_cifar_b12d1_00004   RUNNING         4      8   0.0498037               16        5           105.132    2.25508       0.1349 │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        3           107.306    1.83052       0.2519 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3986) [6,  2000] loss: 2.254
(func pid=3987) [4,  2000] loss: 1.849
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000005) [repeated 2x across cluster]

Trial train_cifar_b12d1_00004 finished iteration 6 at 2025-10-15 19:13:08. Total running time: 2min 5s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000005 │
│ time_this_iter_s                                  15.48108 │
│ time_total_s                                     120.61347 │
│ training_iteration                                       6 │
│ accuracy                                            0.1362 │
│ loss                                               2.25259 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000005

Trial train_cifar_b12d1_00001 finished iteration 2 at 2025-10-15 19:13:11. Total running time: 2min 8s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000001 │
│ time_this_iter_s                                  50.80331 │
│ time_total_s                                     124.42479 │
│ training_iteration                                       2 │
│ accuracy                                            0.5174 │
│ loss                                               1.37102 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000001
(func pid=3987) [4,  4000] loss: 0.924 [repeated 3x across cluster]
(func pid=3986) [7,  2000] loss: 2.252 [repeated 2x across cluster]

Trial train_cifar_b12d1_00005 finished iteration 4 at 2025-10-15 19:13:19. Total running time: 2min 16s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000003 │
│ time_this_iter_s                                  24.67681 │
│ time_total_s                                     131.98245 │
│ training_iteration                                       4 │
│ accuracy                                            0.2711 │
│ loss                                               1.80793 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000003
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000003) [repeated 2x across cluster]

Trial train_cifar_b12d1_00004 finished iteration 7 at 2025-10-15 19:13:22. Total running time: 2min 19s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000006 │
│ time_this_iter_s                                  14.28356 │
│ time_total_s                                     134.89702 │
│ training_iteration                                       7 │
│ accuracy                                            0.1349 │
│ loss                                               2.25291 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000006
(func pid=3987) [5,  2000] loss: 1.836 [repeated 3x across cluster]
(func pid=3986) [8,  2000] loss: 2.255 [repeated 3x across cluster]

Trial status: 6 TERMINATED | 4 RUNNING
Current time: 2025-10-15 19:13:33. Total running time: 2min 30s
Logical resource usage: 8.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        2           124.425    1.37102       0.5174 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        1           113.724    1.68835       0.3598 │
│ train_cifar_b12d1_00004   RUNNING         4      8   0.0498037               16        7           134.897    2.25291       0.1349 │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        4           131.982    1.80793       0.2711 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00004 finished iteration 8 at 2025-10-15 19:13:36. Total running time: 2min 33s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000007 │
│ time_this_iter_s                                  14.16299 │
│ time_total_s                                     149.06001 │
│ training_iteration                                       8 │
│ accuracy                                            0.1416 │
│ loss                                               2.25392 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000007
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000007) [repeated 2x across cluster]

Trial train_cifar_b12d1_00005 finished iteration 5 at 2025-10-15 19:13:42. Total running time: 2min 39s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000004 │
│ time_this_iter_s                                  23.29912 │
│ time_total_s                                     155.28157 │
│ training_iteration                                       5 │
│ accuracy                                              0.27 │
│ loss                                               1.81683 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000004
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000004)
(func pid=3985) [2, 12000] loss: 0.256 [repeated 4x across cluster]
(func pid=3985) [2, 14000] loss: 0.219 [repeated 3x across cluster]

Trial train_cifar_b12d1_00004 finished iteration 9 at 2025-10-15 19:13:51. Total running time: 2min 48s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000008 │
│ time_this_iter_s                                  14.34101 │
│ time_total_s                                     163.40102 │
│ training_iteration                                       9 │
│ accuracy                                            0.1416 │
│ loss                                                2.2467 │
╰────────────────────────────────────────────────────────────╯
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000008)
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000008

Trial train_cifar_b12d1_00001 finished iteration 3 at 2025-10-15 19:13:56. Total running time: 2min 53s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000002 │
│ time_this_iter_s                                  45.04538 │
│ time_total_s                                     169.47017 │
│ training_iteration                                       3 │
│ accuracy                                            0.5503 │
│ loss                                               1.27697 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000002
(func pid=3983) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000002)
(func pid=3985) [2, 16000] loss: 0.188 [repeated 3x across cluster]

Trial status: 6 TERMINATED | 4 RUNNING
Current time: 2025-10-15 19:14:03. Total running time: 3min 0s
Logical resource usage: 8.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        3           169.47     1.27697       0.5503 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        1           113.724    1.68835       0.3598 │
│ train_cifar_b12d1_00004   RUNNING         4      8   0.0498037               16        9           163.401    2.2467        0.1416 │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        5           155.282    1.81683       0.27   │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) [2, 18000] loss: 0.168 [repeated 3x across cluster]

Trial train_cifar_b12d1_00004 finished iteration 10 at 2025-10-15 19:14:05. Total running time: 3min 2s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00004 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000009 │
│ time_this_iter_s                                  14.11064 │
│ time_total_s                                     177.51166 │
│ training_iteration                                      10 │
│ accuracy                                            0.1416 │
│ loss                                               2.24623 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00004 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000009

Trial train_cifar_b12d1_00004 completed after 10 iterations at 2025-10-15 19:14:05. Total running time: 3min 2s
(func pid=3986) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00004_4_batch_size=16,l1=4,l2=8,lr=0.0498_2025-10-15_19-11-02/checkpoint_000009)

Trial train_cifar_b12d1_00005 finished iteration 6 at 2025-10-15 19:14:06. Total running time: 3min 3s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000005 │
│ time_this_iter_s                                  23.82316 │
│ time_total_s                                     179.10473 │
│ training_iteration                                       6 │
│ accuracy                                            0.2791 │
│ loss                                               1.79609 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000005
(func pid=3985) [2, 20000] loss: 0.147 [repeated 2x across cluster]
(func pid=3983) [4,  6000] loss: 0.386 [repeated 3x across cluster]

Trial train_cifar_b12d1_00003 finished iteration 2 at 2025-10-15 19:14:19. Total running time: 3min 16s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000001 │
│ time_this_iter_s                                  78.25734 │
│ time_total_s                                     191.98135 │
│ training_iteration                                       2 │
│ accuracy                                            0.4398 │
│ loss                                               1.51965 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000001
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000001) [repeated 2x across cluster]
(func pid=3985) [3,  2000] loss: 1.462 [repeated 2x across cluster]

Trial train_cifar_b12d1_00005 finished iteration 7 at 2025-10-15 19:14:28. Total running time: 3min 26s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000006 │
│ time_this_iter_s                                  22.33127 │
│ time_total_s                                       201.436 │
│ training_iteration                                       7 │
│ accuracy                                            0.3025 │
│ loss                                               1.81516 │
╰────────────────────────────────────────────────────────────╯
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000006)
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000006
(func pid=3985) [3,  4000] loss: 0.707 [repeated 2x across cluster]

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-10-15 19:14:33. Total running time: 3min 30s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        3           169.47     1.27697       0.5503 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        2           191.981    1.51965       0.4398 │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        7           201.436    1.81516       0.3025 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00001 finished iteration 4 at 2025-10-15 19:14:37. Total running time: 3min 34s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000003 │
│ time_this_iter_s                                  40.77438 │
│ time_total_s                                     210.24455 │
│ training_iteration                                       4 │
│ accuracy                                            0.5838 │
│ loss                                               1.19521 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000003
(func pid=3983) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000003)
(func pid=3985) [3,  6000] loss: 0.481 [repeated 3x across cluster]
(func pid=3983) [5,  2000] loss: 1.040
(func pid=3987) [8,  4000] loss: 0.887
(func pid=3983) [5,  4000] loss: 0.545 [repeated 2x across cluster]

Trial train_cifar_b12d1_00005 finished iteration 8 at 2025-10-15 19:14:51. Total running time: 3min 48s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000007 │
│ time_this_iter_s                                  22.12751 │
│ time_total_s                                     223.56351 │
│ training_iteration                                       8 │
│ accuracy                                            0.3012 │
│ loss                                               1.79996 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000007
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000007)
(func pid=3985) [3, 12000] loss: 0.232 [repeated 2x across cluster]
(func pid=3985) [3, 14000] loss: 0.199 [repeated 3x across cluster]

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-10-15 19:15:03. Total running time: 4min 0s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        4           210.245    1.19521       0.5838 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        2           191.981    1.51965       0.4398 │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        8           223.564    1.79996       0.3012 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) [3, 16000] loss: 0.171 [repeated 3x across cluster]

Trial train_cifar_b12d1_00005 finished iteration 9 at 2025-10-15 19:15:12. Total running time: 4min 10s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000008 │
│ time_this_iter_s                                  21.93643 │
│ time_total_s                                     245.49994 │
│ training_iteration                                       9 │
│ accuracy                                            0.2995 │
│ loss                                               1.75839 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000008
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000008)
(func pid=3985) [3, 18000] loss: 0.151 [repeated 2x across cluster]

Trial train_cifar_b12d1_00001 finished iteration 5 at 2025-10-15 19:15:16. Total running time: 4min 14s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000004 │
│ time_this_iter_s                                  39.37837 │
│ time_total_s                                     249.62292 │
│ training_iteration                                       5 │
│ accuracy                                            0.5865 │
│ loss                                               1.19623 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000004
(func pid=3985) [3, 20000] loss: 0.136
(func pid=3987) [10,  2000] loss: 1.733
(func pid=3987) [10,  4000] loss: 0.871 [repeated 2x across cluster]

Trial train_cifar_b12d1_00003 finished iteration 3 at 2025-10-15 19:15:29. Total running time: 4min 26s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000002 │
│ time_this_iter_s                                  69.96989 │
│ time_total_s                                     261.95124 │
│ training_iteration                                       3 │
│ accuracy                                             0.498 │
│ loss                                               1.37189 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000002
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000002) [repeated 2x across cluster]

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-10-15 19:15:33. Total running time: 4min 30s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        5           249.623    1.19623       0.5865 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        3           261.951    1.37189       0.498  │
│ train_cifar_b12d1_00005   RUNNING        16      1   0.002926                 8        9           245.5      1.75839       0.2995 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00005 finished iteration 10 at 2025-10-15 19:15:35. Total running time: 4min 32s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00005 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000009 │
│ time_this_iter_s                                  22.22139 │
│ time_total_s                                     267.72132 │
│ training_iteration                                      10 │
│ accuracy                                            0.2824 │
│ loss                                               1.80538 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00005 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000009

Trial train_cifar_b12d1_00005 completed after 10 iterations at 2025-10-15 19:15:35. Total running time: 4min 32s
(func pid=3985) [4,  2000] loss: 1.314 [repeated 2x across cluster]
(func pid=3987) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00005_5_batch_size=8,l1=16,l2=1,lr=0.0029_2025-10-15_19-11-02/checkpoint_000009)
(func pid=3985) [4,  4000] loss: 0.649 [repeated 2x across cluster]
(func pid=3985) [4,  6000] loss: 0.441 [repeated 2x across cluster]
(func pid=3985) [4,  8000] loss: 0.332 [repeated 2x across cluster]

Trial train_cifar_b12d1_00001 finished iteration 6 at 2025-10-15 19:15:55. Total running time: 4min 52s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000005 │
│ time_this_iter_s                                  38.39032 │
│ time_total_s                                     288.01324 │
│ training_iteration                                       6 │
│ accuracy                                            0.5776 │
│ loss                                               1.25442 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000005
(func pid=3983) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000005)
(func pid=3985) [4, 10000] loss: 0.261
(func pid=3983) [7,  2000] loss: 0.909

Trial status: 8 TERMINATED | 2 RUNNING
Current time: 2025-10-15 19:16:03. Total running time: 5min 1s
Logical resource usage: 4.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        6           288.013    1.25442       0.5776 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        3           261.951    1.37189       0.498  │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3983) [7,  4000] loss: 0.461 [repeated 2x across cluster]
(func pid=3983) [7,  6000] loss: 0.318 [repeated 2x across cluster]
(func pid=3985) [4, 18000] loss: 0.144 [repeated 2x across cluster]
(func pid=3983) [7, 10000] loss: 0.197 [repeated 3x across cluster]

Trial train_cifar_b12d1_00001 finished iteration 7 at 2025-10-15 19:16:31. Total running time: 5min 28s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000006 │
│ time_this_iter_s                                  35.91098 │
│ time_total_s                                     323.92423 │
│ training_iteration                                       7 │
│ accuracy                                            0.5988 │
│ loss                                               1.19537 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000006
(func pid=3983) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000006)

Trial train_cifar_b12d1_00003 finished iteration 4 at 2025-10-15 19:16:33. Total running time: 5min 30s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000003 │
│ time_this_iter_s                                  64.35356 │
│ time_total_s                                      326.3048 │
│ training_iteration                                       4 │
│ accuracy                                            0.5263 │
│ loss                                               1.31372 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000003
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000003)

Trial status: 8 TERMINATED | 2 RUNNING
Current time: 2025-10-15 19:16:33. Total running time: 5min 31s
Logical resource usage: 4.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        7           323.924    1.19537       0.5988 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        4           326.305    1.31372       0.5263 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3983) [8,  2000] loss: 0.845
(func pid=3985) [5,  2000] loss: 1.256
(func pid=3985) [5,  4000] loss: 0.629 [repeated 2x across cluster]
(func pid=3983) [8,  6000] loss: 0.302
(func pid=3985) [5,  6000] loss: 0.419
(func pid=3985) [5,  8000] loss: 0.315
(func pid=3983) [8,  8000] loss: 0.232
(func pid=3985) [5, 10000] loss: 0.252
(func pid=3983) [8, 10000] loss: 0.188
Trial status: 8 TERMINATED | 2 RUNNING
Current time: 2025-10-15 19:17:03. Total running time: 6min 1s
Logical resource usage: 4.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        7           323.924    1.19537       0.5988 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        4           326.305    1.31372       0.5263 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00001 finished iteration 8 at 2025-10-15 19:17:07. Total running time: 6min 4s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000007 │
│ time_this_iter_s                                   36.1647 │
│ time_total_s                                     360.08892 │
│ training_iteration                                       8 │
│ accuracy                                            0.5938 │
│ loss                                               1.23727 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000007
(func pid=3983) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000007)
(func pid=3985) [5, 14000] loss: 0.176 [repeated 2x across cluster]
(func pid=3985) [5, 16000] loss: 0.154 [repeated 2x across cluster]
(func pid=3983) [9,  6000] loss: 0.282 [repeated 3x across cluster]
(func pid=3983) [9,  8000] loss: 0.218 [repeated 2x across cluster]

Trial status: 8 TERMINATED | 2 RUNNING
Current time: 2025-10-15 19:17:34. Total running time: 6min 31s
Logical resource usage: 4.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        8           360.089    1.23727       0.5938 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        4           326.305    1.31372       0.5263 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00003 finished iteration 5 at 2025-10-15 19:17:36. Total running time: 6min 33s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000004 │
│ time_this_iter_s                                  62.36135 │
│ time_total_s                                     388.66614 │
│ training_iteration                                       5 │
│ accuracy                                            0.5454 │
│ loss                                               1.25763 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000004
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000004)
(func pid=3983) [9, 10000] loss: 0.175

Trial train_cifar_b12d1_00001 finished iteration 9 at 2025-10-15 19:17:41. Total running time: 6min 38s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000008 │
│ time_this_iter_s                                  33.85725 │
│ time_total_s                                     393.94617 │
│ training_iteration                                       9 │
│ accuracy                                            0.5745 │
│ loss                                               1.29404 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000008
(func pid=3983) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000008)
(func pid=3985) [6,  2000] loss: 1.208
(func pid=3985) [6,  4000] loss: 0.605
(func pid=3983) [10,  2000] loss: 0.738
(func pid=3983) [10,  4000] loss: 0.401 [repeated 2x across cluster]
(func pid=3983) [10,  6000] loss: 0.268 [repeated 2x across cluster]

Trial status: 8 TERMINATED | 2 RUNNING
Current time: 2025-10-15 19:18:04. Total running time: 7min 1s
Logical resource usage: 4.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00001   RUNNING       256      8   0.00137364               4        9           393.946    1.29404       0.5745 │
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        5           388.666    1.25763       0.5454 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3983) [10,  8000] loss: 0.207 [repeated 2x across cluster]
(func pid=3983) [10, 10000] loss: 0.174 [repeated 2x across cluster]

Trial train_cifar_b12d1_00001 finished iteration 10 at 2025-10-15 19:18:17. Total running time: 7min 14s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00001 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000009 │
│ time_this_iter_s                                   35.9673 │
│ time_total_s                                     429.91347 │
│ training_iteration                                      10 │
│ accuracy                                            0.5855 │
│ loss                                               1.29551 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00001 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000009

Trial train_cifar_b12d1_00001 completed after 10 iterations at 2025-10-15 19:18:17. Total running time: 7min 14s
(func pid=3983) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00001_1_batch_size=4,l1=256,l2=8,lr=0.0014_2025-10-15_19-11-02/checkpoint_000009)
(func pid=3985) [6, 16000] loss: 0.147 [repeated 2x across cluster]
(func pid=3985) [6, 18000] loss: 0.131
(func pid=3985) [6, 20000] loss: 0.121

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:18:34. Total running time: 7min 31s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        5           388.666    1.25763       0.5454 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00003 finished iteration 6 at 2025-10-15 19:18:38. Total running time: 7min 35s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000005 │
│ time_this_iter_s                                  61.90479 │
│ time_total_s                                     450.57093 │
│ training_iteration                                       6 │
│ accuracy                                            0.5398 │
│ loss                                               1.26131 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000005
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000005)
(func pid=3985) [7,  2000] loss: 1.170
(func pid=3985) [7,  4000] loss: 0.579
(func pid=3985) [7,  6000] loss: 0.382
(func pid=3985) [7,  8000] loss: 0.293
(func pid=3985) [7, 10000] loss: 0.235

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:19:04. Total running time: 8min 1s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        6           450.571    1.26131       0.5398 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) [7, 12000] loss: 0.198
(func pid=3985) [7, 14000] loss: 0.170
(func pid=3985) [7, 16000] loss: 0.144
(func pid=3985) [7, 18000] loss: 0.130
(func pid=3985) [7, 20000] loss: 0.118
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:19:34. Total running time: 8min 31s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        6           450.571    1.26131       0.5398 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00003 finished iteration 7 at 2025-10-15 19:19:37. Total running time: 8min 34s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000006 │
│ time_this_iter_s                                  59.09131 │
│ time_total_s                                     509.66224 │
│ training_iteration                                       7 │
│ accuracy                                             0.569 │
│ loss                                               1.21514 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000006
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000006)
(func pid=3985) [8,  2000] loss: 1.118
(func pid=3985) [8,  4000] loss: 0.584
(func pid=3985) [8,  6000] loss: 0.383
(func pid=3985) [8,  8000] loss: 0.284
(func pid=3985) [8, 10000] loss: 0.228

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:20:04. Total running time: 9min 1s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        7           509.662    1.21514       0.569  │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) [8, 12000] loss: 0.192
(func pid=3985) [8, 14000] loss: 0.158
(func pid=3985) [8, 16000] loss: 0.145
(func pid=3985) [8, 18000] loss: 0.125
(func pid=3985) [8, 20000] loss: 0.116
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:20:34. Total running time: 9min 31s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        7           509.662    1.21514       0.569  │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00003 finished iteration 8 at 2025-10-15 19:20:36. Total running time: 9min 33s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000007 │
│ time_this_iter_s                                  58.94575 │
│ time_total_s                                     568.60798 │
│ training_iteration                                       8 │
│ accuracy                                            0.5576 │
│ loss                                                1.2302 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000007
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000007)
(func pid=3985) [9,  2000] loss: 1.129
(func pid=3985) [9,  4000] loss: 0.559
(func pid=3985) [9,  6000] loss: 0.374
(func pid=3985) [9,  8000] loss: 0.280
(func pid=3985) [9, 10000] loss: 0.218

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:21:04. Total running time: 10min 1s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        8           568.608    1.2302        0.5576 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) [9, 12000] loss: 0.181
(func pid=3985) [9, 14000] loss: 0.158
(func pid=3985) [9, 16000] loss: 0.140
(func pid=3985) [9, 18000] loss: 0.122
(func pid=3985) [9, 20000] loss: 0.113
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:21:34. Total running time: 10min 31s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        8           568.608    1.2302        0.5576 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Trial train_cifar_b12d1_00003 finished iteration 9 at 2025-10-15 19:21:34. Total running time: 10min 31s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000008 │
│ time_this_iter_s                                  58.67372 │
│ time_total_s                                     627.28171 │
│ training_iteration                                       9 │
│ accuracy                                            0.5845 │
│ loss                                               1.17367 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000008
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000008)
(func pid=3985) [10,  2000] loss: 1.066
(func pid=3985) [10,  4000] loss: 0.554
(func pid=3985) [10,  6000] loss: 0.362
(func pid=3985) [10,  8000] loss: 0.272
(func pid=3985) [10, 10000] loss: 0.217

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-10-15 19:22:04. Total running time: 11min 1s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00003   RUNNING        16     16   0.000173963              2        9           627.282    1.17367       0.5845 │
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) [10, 12000] loss: 0.180
(func pid=3985) [10, 14000] loss: 0.157
(func pid=3985) [10, 16000] loss: 0.138
(func pid=3985) [10, 18000] loss: 0.120
(func pid=3985) [10, 20000] loss: 0.109

Trial train_cifar_b12d1_00003 finished iteration 10 at 2025-10-15 19:22:32. Total running time: 11min 29s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_b12d1_00003 result                       │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name                      checkpoint_000009 │
│ time_this_iter_s                                  57.82399 │
│ time_total_s                                      685.1057 │
│ training_iteration                                      10 │
│ accuracy                                            0.5662 │
│ loss                                               1.23146 │
╰────────────────────────────────────────────────────────────╯
Trial train_cifar_b12d1_00003 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000009

Trial train_cifar_b12d1_00003 completed after 10 iterations at 2025-10-15 19:22:32. Total running time: 11min 29s

Trial status: 10 TERMINATED
Current time: 2025-10-15 19:22:32. Total running time: 11min 29s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_b12d1_00000   TERMINATED      1      1   0.00255541               2        1           114.239    2.30465       0.1023 │
│ train_cifar_b12d1_00001   TERMINATED    256      8   0.00137364               4       10           429.913    1.29551       0.5855 │
│ train_cifar_b12d1_00002   TERMINATED      8      2   0.0465214                4        1            71.2591   2.35082       0.1011 │
│ train_cifar_b12d1_00003   TERMINATED     16     16   0.000173963              2       10           685.106    1.23146       0.5662 │
│ train_cifar_b12d1_00004   TERMINATED      4      8   0.0498037               16       10           177.512    2.24623       0.1416 │
│ train_cifar_b12d1_00005   TERMINATED     16      1   0.002926                 8       10           267.721    1.80538       0.2824 │
│ train_cifar_b12d1_00006   TERMINATED      2     32   0.0314836                8        1            42.436    2.3102        0.1004 │
│ train_cifar_b12d1_00007   TERMINATED      2      8   0.000201703              8        1            42.0224   2.26399       0.1687 │
│ train_cifar_b12d1_00008   TERMINATED      1     32   0.0132428               16        1            24.3124   2.30466       0.1027 │
│ train_cifar_b12d1_00009   TERMINATED     16    128   0.0250987                4        1            61.6968   2.32275       0.1027 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=3985) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-10-15_19-11-02/train_cifar_b12d1_00003_3_batch_size=2,l1=16,l2=16,lr=0.0002_2025-10-15_19-11-02/checkpoint_000009)

Best trial config: {'l1': 16, 'l2': 16, 'lr': 0.00017396312439312377, 'batch_size': 2}
Best trial final validation loss: 1.2314598659689537
Best trial final validation accuracy: 0.5662
Best trial test set accuracy: 0.5894

如果您运行代码,示例输出可能如下所示

Number of trials: 10/10 (10 TERMINATED)
+-----+--------------+------+------+-------------+--------+---------+------------+
| ... |   batch_size |   l1 |   l2 |          lr |   iter |    loss |   accuracy |
|-----+--------------+------+------+-------------+--------+---------+------------|
| ... |            2 |    1 |  256 | 0.000668163 |      1 | 2.31479 |     0.0977 |
| ... |            4 |   64 |    8 | 0.0331514   |      1 | 2.31605 |     0.0983 |
| ... |            4 |    2 |    1 | 0.000150295 |      1 | 2.30755 |     0.1023 |
| ... |           16 |   32 |   32 | 0.0128248   |     10 | 1.66912 |     0.4391 |
| ... |            4 |    8 |  128 | 0.00464561  |      2 | 1.7316  |     0.3463 |
| ... |            8 |  256 |    8 | 0.00031556  |      1 | 2.19409 |     0.1736 |
| ... |            4 |   16 |  256 | 0.00574329  |      2 | 1.85679 |     0.3368 |
| ... |            8 |    2 |    2 | 0.00325652  |      1 | 2.30272 |     0.0984 |
| ... |            2 |    2 |    2 | 0.000342987 |      2 | 1.76044 |     0.292  |
| ... |            4 |   64 |   32 | 0.003734    |      8 | 1.53101 |     0.4761 |
+-----+--------------+------+------+-------------+--------+---------+------------+

Best trial config: {'l1': 64, 'l2': 32, 'lr': 0.0037339984519545164, 'batch_size': 4}
Best trial final validation loss: 1.5310075663924216
Best trial final validation accuracy: 0.4761
Best trial test set accuracy: 0.4737

大多数试验已提前停止,以避免浪费资源。表现最佳的试验取得了约 47% 的验证准确率,这可以在测试集上得到确认。

好了!您现在可以调优 PyTorch 模型的参数了。

脚本总运行时间: (11 分 43.912 秒)