CrossEntropyLoss#

class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean', label_smoothing=0.0)[source]#

此准则计算输入 logits 和 target 之间的交叉熵损失。

当训练具有 C 个类别的分类问题时，此准则非常有用。如果提供了可选参数 weight，它应该是一个一维 Tensor，为每个类别分配权重。这在训练集不平衡时特别有用。

期望 input 包含每个类别的未归一化 logits（通常不需要为正数或总和为 1）。对于无批次的输入，input 必须是大小为 $(C)$ 的 Tensor；对于批次输入，大小为 $(minibatch, C)$ 或 $(minibatch, C, d_1, d_2, ..., d_K)$ 的 Tensor，其中 $K \geq 1$ 。最后一个形式适用于更高维度的输入，例如计算 2D 图像的每像素交叉熵损失。

此准则期望的 target 应包含以下两者之一：

类别索引，范围为 $[0, C)$ ，其中 $C$ 是类别数；如果指定了 ignore_index，此损失也接受该类别索引（该索引不一定在类别范围内）。在此情况下，未归一化（即 reduction 设置为 'none'）的损失可描述为：

$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}$
其中 $x$ 是输入， $y$ 是目标， $w$ 是权重， $C$ 是类别数， $N$ 跨越了小批量维度以及 $d_1, ..., d_k$ for the K-dimensional case. If reduction is not 'none' (default 'mean'), then

$\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}$
请注意，这种情况等同于对输入应用 LogSoftmax，然后是 NLLLoss。
每个类别的概率；当需要每个小批量项的标签超出单个类别时（例如，对于混合标签、标签平滑等），这很有用。在这种情况下，未归一化（即 reduction 设置为 'none'）的损失可描述为：

$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\sum_{i=1}^C \exp(x_{n,i})} y_{n,c}$
其中 $x$ 是输入， $y$ 是目标， $w$ 是权重， $C$ 是类别数， $N$ 跨越了小批量维度以及 $d_1, ..., d_k$ for the K-dimensional case. If reduction is not 'none' (default 'mean'), then

$\ell(x, y) = \begin{cases} \frac{\sum_{n=1}^N l_n}{N}, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}$

注意

当 target 包含类别索引时，此准则的性能通常更好，因为它允许进行优化计算。仅当每个小批量项的单个类别标签过于受限时，才考虑将 target 提供为类别概率。

参数

weight (Tensor, optional) – 为每个类别手动指定的重采样权重。如果提供，则必须是大小为 C 的 Tensor。
size_average (bool, optional) – 已弃用 (参见 reduction)。默认情况下，损失值在批次中的每个损失元素上取平均值。请注意，对于某些损失，每个样本有多个元素。如果字段 size_average 设置为 False，则损失值在每个小批次中而是求和。当 reduce 为 False 时忽略。默认值：True
ignore_index (int, optional) – 指定一个被忽略的目标值，该值不计入输入梯度。当 size_average 为 True 时，损失将对非忽略的目标取平均。请注意，ignore_index 仅适用于 target 包含类别索引的情况。
reduce (bool, optional) – 已弃用 (参见 reduction)。默认情况下，损失值在每个小批次中根据 size_average 对观测值进行平均或求和。当 reduce 为 False 时，返回每个批次元素的损失值，并忽略 size_average。默认值：True
reduction (str, optional) – 指定应用于输出的归约方式：'none' | 'mean' | 'sum'。'none'：不进行归约，'mean'：取输出的加权平均值，'sum'：对输出进行求和。注意：size_average 和 reduce 正在被弃用，在此期间，指定其中任何一个参数都会覆盖 reduction。默认值：'mean'
label_smoothing (float, optional) – 一个在 [0.0, 1.0] 范围内的浮点数。指定计算损失时的平滑量，0.0 表示无平滑。目标变为原始真实标签和均匀分布的混合，如 Rethinking the Inception Architecture for Computer Vision 中所述。默认值： $0.0$ 。

形状

输入：形状为 $(C)$ ， $(N, C)$ 或 $(N, C, d_1, d_2, ..., d_K)$ 的 Tensor，其中 $K \geq 1$ 用于 K 维损失。
目标：如果包含类别索引，则形状为 $()$ ， $(N)$ 或 $(N, d_1, d_2, ..., d_K)$ 的 K 维损失，其中每个值应介于 $[0, C)$ 之间。使用类别索引时，目标数据类型必须是 long 型。如果包含类别概率，则目标必须与输入形状相同，并且每个值应介于 $[0, 1]$ 之间。这意味着使用类别概率时，目标数据类型必须是 float 型。请注意，PyTorch 不严格强制类别概率中的概率约束，用户有责任确保 target 包含有效的概率分布（有关更多详细信息，请参阅下面的示例部分）。
输出：如果 reduction 为 'none'，则形状为 $()$ ， $(N)$ 或 $(N, d_1, d_2, ..., d_K)$ 的 K 维损失，取决于输入的形状。否则，为标量。

其中

\begin{aligned} C ={} & \text{number of classes} \\ N ={} & \text{batch size} \\ \end{aligned}

示例

>>> # Example of target with class indices
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
>>>
>>> # Example of target with class probabilities
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5).softmax(dim=1)
>>> output = loss(input, target)
>>> output.backward()

注意

当 target 包含类别概率时，它应包含软标签——即，每个 target 条目应代表给定数据样本的类别概率分布，其中单个概率介于 [0,1] 之间，并且整个分布的总和为 1。这就是为什么在上面的类别概率示例中将 softmax() 函数应用于 target。

PyTorch 不会验证 target 中的值是否在 [0,1] 范围内，也不会验证每个数据样本的分布是否总和为 1。不会发出警告，用户有责任确保 target 包含有效的概率分布。提供任意值可能导致误导性的损失值和不稳定的训练梯度。

示例

>>> # Example of target with incorrectly specified class probabilities
>>> loss = nn.CrossEntropyLoss()
>>> torch.manual_seed(283)
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> # Provided target class probabilities are not in range [0,1]
>>> target
tensor([[ 0.7105,  0.4446,  2.0297,  0.2671, -0.6075],
        [-1.0496, -0.2753, -0.3586,  0.9270,  1.0027],
        [ 0.7551,  0.1003,  1.3468, -0.3581, -0.9569]])
>>> # Provided target class probabilities do not sum to 1
>>> target.sum(axis=1)
tensor([2.8444, 0.2462, 0.8873])
>>> # No error message and possible misleading loss value
>>> loss(input, target).item()
4.6379876136779785
>>>
>>> # Example of target with correctly specified class probabilities
>>> # Use .softmax() to ensure true probability distribution
>>> target_new = target.softmax(dim=1)
>>> # New target class probabilities all in range [0,1]
>>> target_new
tensor([[0.1559, 0.1195, 0.5830, 0.1000, 0.0417],
        [0.0496, 0.1075, 0.0990, 0.3579, 0.3860],
        [0.2607, 0.1355, 0.4711, 0.0856, 0.0471]])
>>> # New target class probabilities sum to 1
>>> target_new.sum(axis=1)
tensor([1.0000, 1.0000, 1.0000])
>>> loss(input, target_new).item()
2.55349063873291

forward(input, target)[source]#

执行前向传播。

返回类型: 张量

CrossEntropyLoss#

文档

教程

资源