评价此页

★ ★ ★ ★ ★

beginner/nlp/pytorch_tutorial

在 Google Colab 中运行

注意

请滚动到最后下载完整的示例代码。

PyTorch 入门#

创建于: 2017年04月08日 | 最后更新: 2021年06月24日 | 最后验证: 2024年11月05日

Torch 的 Tensor 库入门#

深度学习的一切都是在 Tensor 上进行的计算，Tensor 是矩阵的推广，可以被索引到 2 维以上。我们稍后会详细了解这意味着什么。首先，让我们看看我们可以用 Tensor 做些什么。

# Author: Robert Guthrie

import torch

torch.manual_seed(1)

<torch._C.Generator object at 0x7f4a991809b0>

创建 Tensor#

可以使用 `torch.tensor()` 函数从 Python 列表中创建 Tensor。

# torch.tensor(data) creates a torch.Tensor object with the given data.
V_data = [1., 2., 3.]
V = torch.tensor(V_data)
print(V)

# Creates a matrix
M_data = [[1., 2., 3.], [4., 5., 6]]
M = torch.tensor(M_data)
print(M)

# Create a 3D tensor of size 2x2x2.
T_data = [[[1., 2.], [3., 4.]],
          [[5., 6.], [7., 8.]]]
T = torch.tensor(T_data)
print(T)

tensor([1., 2., 3.])
tensor([[1., 2., 3.],
        [4., 5., 6.]])
tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]])

三维 Tensor 到底是什么？可以这样想。如果你有一个向量，索引向量会得到一个标量。如果你有一个矩阵，索引矩阵会得到一个向量。如果你有一个三维 Tensor，那么索引这个 Tensor 就会得到一个矩阵！

关于术语说明：在本教程中，当我提到“Tensor”时，指的是任何 `torch.Tensor` 对象。矩阵和向量是 `torch.Tensor` 的特例，它们的维度分别为 2 和 1。当我谈论三维 Tensor 时，我会明确使用“三维 Tensor”这个术语。

# Index into V and get a scalar (0 dimensional tensor)
print(V[0])
# Get a Python number from it
print(V[0].item())

# Index into M and get a vector
print(M[0])

# Index into T and get a matrix
print(T[0])

tensor(1.)
1.0
tensor([1., 2., 3.])
tensor([[1., 2.],
        [3., 4.]])

你也可以创建其他数据类型的 Tensor。要创建整数类型的 Tensor，请尝试 `torch.tensor([[1, 2], [3, 4]])` (其中列表中的所有元素都是整数)。你也可以通过传递 `dtype=torch.data_type` 来指定数据类型。请查阅文档了解更多数据类型，但 Float 和 Long 将是最常用的。

你可以使用 `torch.randn()` 以指定的维度创建包含随机数据的 Tensor。

x = torch.randn((3, 4, 5))
print(x)

tensor([[[-1.5256, -0.7502, -0.6540, -1.6095, -0.1002],
         [-0.6092, -0.9798, -1.6091, -0.7121,  0.3037],
         [-0.7773, -0.2515, -0.2223,  1.6871,  0.2284],
         [ 0.4676, -0.6970, -1.1608,  0.6995,  0.1991]],

        [[ 0.8657,  0.2444, -0.6629,  0.8073,  1.1017],
         [-0.1759, -2.2456, -1.4465,  0.0612, -0.6177],
         [-0.7981, -0.1316,  1.8793, -0.0721,  0.1578],
         [-0.7735,  0.1991,  0.0457,  0.1530, -0.4757]],

        [[-0.1110,  0.2927, -0.1578, -0.0288,  0.4533],
         [ 1.1422,  0.2486, -1.7754, -0.0255, -1.0233],
         [-0.5962, -1.0055,  0.4285,  1.4761, -1.7869],
         [ 1.6103, -0.7040, -0.1853, -0.9962, -0.8313]]])

Tensor 的运算#

你可以按照预期的方式对 Tensor 进行运算。

x = torch.tensor([1., 2., 3.])
y = torch.tensor([4., 5., 6.])
z = x + y
print(z)

tensor([5., 7., 9.])

请参阅文档以获取大量可用运算的完整列表。它们不仅仅包括数学运算。

一个我们稍后会用到的有用的运算是拼接。

# By default, it concatenates along the first axis (concatenates rows)
x_1 = torch.randn(2, 5)
y_1 = torch.randn(3, 5)
z_1 = torch.cat([x_1, y_1])
print(z_1)

# Concatenate columns:
x_2 = torch.randn(2, 3)
y_2 = torch.randn(2, 5)
# second arg specifies which axis to concat along
z_2 = torch.cat([x_2, y_2], 1)
print(z_2)

# If your tensors are not compatible, torch will complain.  Uncomment to see the error
# torch.cat([x_1, x_2])

tensor([[-0.8029,  0.2366,  0.2857,  0.6898, -0.6331],
        [ 0.8795, -0.6842,  0.4533,  0.2912, -0.8317],
        [-0.5525,  0.6355, -0.3968, -0.6571, -1.6428],
        [ 0.9803, -0.0421, -0.8206,  0.3133, -1.1352],
        [ 0.3773, -0.2824, -2.5667, -1.4303,  0.5009]])
tensor([[ 0.5438, -0.4057,  1.1341, -0.1473,  0.6272,  1.0935,  0.0939,  1.2381],
        [-1.1115,  0.3501, -0.7703, -1.3459,  0.5119, -0.6933, -0.1668, -0.9999]])

改变 Tensor 的形状#

使用 `.view()` 方法可以改变 Tensor 的形状。这个方法会频繁使用，因为许多神经网络组件都期望其输入具有特定的形状。通常你需要先改变数据的形状，然后再将其传递给组件。

x = torch.randn(2, 3, 4)
print(x)
print(x.view(2, 12))  # Reshape to 2 rows, 12 columns
# Same as above.  If one of the dimensions is -1, its size can be inferred
print(x.view(2, -1))

tensor([[[ 0.4175, -0.2127, -0.8400, -0.4200],
         [-0.6240, -0.9773,  0.8748,  0.9873],
         [-0.0594, -2.4919,  0.2423,  0.2883]],

        [[-0.1095,  0.3126,  1.5038,  0.5038],
         [ 0.6223, -0.4481, -0.2856,  0.3880],
         [-1.1435, -0.6512, -0.1032,  0.6937]]])
tensor([[ 0.4175, -0.2127, -0.8400, -0.4200, -0.6240, -0.9773,  0.8748,  0.9873,
         -0.0594, -2.4919,  0.2423,  0.2883],
        [-0.1095,  0.3126,  1.5038,  0.5038,  0.6223, -0.4481, -0.2856,  0.3880,
         -1.1435, -0.6512, -0.1032,  0.6937]])
tensor([[ 0.4175, -0.2127, -0.8400, -0.4200, -0.6240, -0.9773,  0.8748,  0.9873,
         -0.0594, -2.4919,  0.2423,  0.2883],
        [-0.1095,  0.3126,  1.5038,  0.5038,  0.6223, -0.4481, -0.2856,  0.3880,
         -1.1435, -0.6512, -0.1032,  0.6937]])

计算图和自动微分#

计算图的概念对于高效的深度学习编程至关重要，因为它允许你无需手动编写反向传播的梯度。计算图只是一个说明你的数据是如何组合以产生输出的规范。由于图完全指定了哪些参数参与了哪些运算，它包含了计算导数所需的所有信息。这听起来可能有些模糊，所以让我们使用基础的标志 `requires_grad` 来看看具体是怎么回事。

首先，从程序员的角度思考。上面创建的 `torch.Tensor` 对象中存储了什么？显然是数据和形状，可能还有一些其他东西。但是当我们把两个 Tensor 相加时，我们得到了一个输出 Tensor。这个输出 Tensor 只知道它自己的数据和形状。它不知道它是另外两个 Tensor 相加的结果（它可能是从文件中读取的，也可能是其他运算的结果等等）。

如果 `requires_grad=True`，那么 Tensor 对象会跟踪它的创建过程。让我们实际看看。

# Tensor factory methods have a ``requires_grad`` flag
x = torch.tensor([1., 2., 3], requires_grad=True)

# With requires_grad=True, you can still do all the operations you previously
# could
y = torch.tensor([4., 5., 6], requires_grad=True)
z = x + y
print(z)

# BUT z knows something extra.
print(z.grad_fn)

tensor([5., 7., 9.], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7f4a3e1dc6d0>

所以 Tensor 知道是什么创建了它们。z 知道它不是从文件中读取的，也不是乘法或指数运算的结果等等。如果你继续跟踪 `z.grad_fn`，你会找到 x 和 y。

但这如何帮助我们计算梯度呢？

# Let's sum up all the entries in z
s = z.sum()
print(s)
print(s.grad_fn)

tensor(21., grad_fn=<SumBackward0>)
<SumBackward0 object at 0x7f4a3e1dfee0>

那么，这个和对 x 的第一个分量的导数是多少？在数学上，我们想要

\[\frac{\partial s}{\partial x_0}\]

好了，s 知道它是通过将 Tensor z 相加得到的。z 知道它是 x + y 的和。所以

\[s = \overbrace{x_0 + y_0}^\text{$z_0$} + \overbrace{x_1 + y_1}^\text{$z_1$} + \overbrace{x_2 + y_2}^\text{$z_2$} \]

因此，s 包含了足够的信息来确定我们想要的导数是 1！

当然，这忽略了如何实际计算导数的挑战。这里的重点是 s 携带了足够的信息，使得计算导数是可能的。实际上，PyTorch 的开发者对 `sum()` 和 `+` 运算进行了编程，使其知道如何计算其梯度，并运行反向传播算法。对该算法的深入讨论超出了本教程的范围。

让我们让 PyTorch 计算梯度，看看我们是否正确：（请注意，如果您多次运行此块，梯度将累加。这是因为 PyTorch 会将梯度累积到 `.grad` 属性中，因为对于许多模型来说，这非常方便。）

# calling .backward() on any variable will run backprop, starting from it.
s.backward()
print(x.grad)

tensor([1., 1., 1.])

理解下面这个代码块中的内容对于成为一名成功的深度学习程序员至关重要。

x = torch.randn(2, 2)
y = torch.randn(2, 2)
# By default, user created Tensors have ``requires_grad=False``
print(x.requires_grad, y.requires_grad)
z = x + y
# So you can't backprop through z
print(z.grad_fn)

# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
# flag in-place. The input flag defaults to ``True`` if not given.
x = x.requires_grad_()
y = y.requires_grad_()
# z contains enough information to compute gradients, as we saw above
z = x + y
print(z.grad_fn)
# If any input to an operation has ``requires_grad=True``, so will the output
print(z.requires_grad)

# Now z has the computation history that relates itself to x and y
# Can we just take its values, and **detach** it from its history?
new_z = z.detach()

# ... does new_z have information to backprop to x and y?
# NO!
print(new_z.grad_fn)
# And how could it? ``z.detach()`` returns a tensor that shares the same storage
# as ``z``, but with the computation history forgotten. It doesn't know anything
# about how it was computed.
# In essence, we have broken the Tensor away from its past history

False False
None
<AddBackward0 object at 0x7f4a3e1df700>
True
None

你也可以通过将代码块包装在 `with torch.no_grad():` 中，来阻止 autograd 跟踪具有 `requires_grad=True` 的 Tensor 的历史记录。

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False

脚本总运行时间: (0 分 0.140 秒)