快速入门指南¤

本指南假设读者没有pytorch或其他深度学习框架的基础知识，但具备一些神经网络的基本概念。旨在快速概述tinygrad提供的高级API。

本指南也以教程形式组织，通过它您最终将获得一个可以识别手写数字的工作模型。

我们需要一些导入来开始:

import numpy as np
from tinygrad.helpers import Timing

张量¤

张量是tinygrad中的基础数据结构。它们可以被视为特定数据类型的多维数组。 tinygrad中的所有高级操作都作用于这些张量。

张量类可以像这样导入：

from tinygrad import Tensor

张量可以从现有的数据结构创建，比如Python列表或NumPy ndarray：

t1 = Tensor([1, 2, 3, 4, 5])
na = np.array([1, 2, 3, 4, 5])
t2 = Tensor(na)

张量也可以通过多种工厂方法之一来创建：

full = Tensor.full(shape=(2, 3), fill_value=5) # create a tensor of shape (2, 3) filled with 5
zeros = Tensor.zeros(2, 3) # create a tensor of shape (2, 3) filled with 0
ones = Tensor.ones(2, 3) # create a tensor of shape (2, 3) filled with 1

full_like = Tensor.full_like(full, fill_value=2) # create a tensor of the same shape as `full` filled with 2
zeros_like = Tensor.zeros_like(full) # create a tensor of the same shape as `full` filled with 0
ones_like = Tensor.ones_like(full) # create a tensor of the same shape as `full` filled with 1

eye = Tensor.eye(3) # create a 3x3 identity matrix
arange = Tensor.arange(start=0, stop=10, step=1) # create a tensor of shape (10,) filled with values from 0 to 9

rand = Tensor.rand(2, 3) # create a tensor of shape (2, 3) filled with random values from a uniform distribution
randn = Tensor.randn(2, 3) # create a tensor of shape (2, 3) filled with random values from a standard normal distribution
uniform = Tensor.uniform(2, 3, low=0, high=10) # create a tensor of shape (2, 3) filled with random values from a uniform distribution between 0 and 10

还有更多类似的工厂方法，您可以在Tensor Creation文件中找到它们。

所有张量创建方法都可以接受一个dtype参数来指定张量的数据类型，支持的dtype类型可在dtypes中找到。

from tinygrad import dtypes

t3 = Tensor([1, 2, 3, 4, 5], dtype=dtypes.int32)

张量允许您对它们执行如下操作：

t4 = Tensor([1, 2, 3, 4, 5])
t5 = (t4 + 1) * 2
t6 = (t5 * t4).relu().log_softmax()

所有这些操作都是惰性的，只有在您使用.realize()或.numpy()实现张量时才会执行。

print(t6.numpy())
# [-56. -48. -36. -20.   0.]

可以对张量执行更多操作，您可以在Tensor Ops文件中找到它们。此外，阅读abstractions2.py将帮助您理解这些张量上的操作如何传递到您的硬件。

模型¤

在tinygrad中，神经网络实际上仅由对张量执行的操作表示。这些操作通常被分组到类的__call__方法中，这样可以实现这些操作组的模块化和重用。这些类不需要继承任何基类，事实上如果它们不需要任何可训练参数，甚至不需要是一个类！

一个例子是nn.Linear类，它表示神经网络中的一个线性层。

class Linear:
  def __init__(self, in_features, out_features, bias=True, initialization: str='kaiming_uniform'):
    self.weight = getattr(Tensor, initialization)(out_features, in_features)
    self.bias = Tensor.zeros(out_features) if bias else None

  def __call__(self, x):
    return x.linear(self.weight.transpose(), self.bias)

在nn中已经实现了更多的神经网络模块，你也可以实现自己的模块。

我们将实现一个简单的神经网络，能够对MNIST数据集中的手写数字进行分类。我们的分类器将是一个简单的2层神经网络，使用Leaky ReLU激活函数。它将使用128个节点的隐藏层和10个节点的输出层（每个数字对应一个输出），且两个线性层都不使用偏置项。

class TinyNet:
  def __init__(self):
    self.l1 = Linear(784, 128, bias=False)
    self.l2 = Linear(128, 10, bias=False)

  def __call__(self, x):
    x = self.l1(x)
    x = x.leaky_relu()
    x = self.l2(x)
    return x

net = TinyNet()

我们可以看到，神经网络的前向传播只是对输入张量x执行的一系列操作。我们还可以看到，像leaky_relu这样的函数操作并没有被定义为类，而是可以直接调用的方法。最后，我们只需初始化神经网络的一个实例，就可以开始训练它了。

训练¤

现在我们已经定义了神经网络，可以开始训练它了。在tinygrad中训练神经网络非常简单。我们只需要定义神经网络、定义损失函数，然后在损失函数上调用.backward()来计算梯度。然后可以使用这些梯度，通过众多Optimizers中的一个来更新神经网络的参数。

对于我们的损失函数，我们将使用稀疏分类交叉熵损失。下面的实现取自tensor.py，这里复制出来是为了强调tinygrad的一个重要细节。

def sparse_categorical_crossentropy(self, Y, ignore_index=-1) -> Tensor:
    loss_mask = Y != ignore_index
    y_counter = Tensor.arange(self.shape[-1], dtype=dtypes.int32, requires_grad=False, device=self.device).unsqueeze(0).expand(Y.numel(), self.shape[-1])
    y = ((y_counter == Y.flatten().reshape(-1, 1)).where(-1.0, 0) * loss_mask.reshape(-1, 1)).reshape(*Y.shape, self.shape[-1])
    return self.log_softmax().mul(y).sum() / loss_mask.sum()

正如我们在这个交叉熵损失函数的实现中所看到的，tinygrad原生不支持某些操作。 tinygrad原生不支持加载/存储操作，因为它们在尝试移植到不同后端时会增加复杂性，90%的模型不需要使用它们，而且可以像上面那样通过arange掩码来实现。

对于我们的优化器，我们将使用传统的随机梯度下降优化器，学习率为3e-4。

from tinygrad.nn.optim import SGD

opt = SGD([net.l1.weight, net.l2.weight], lr=3e-4)

我们可以看到，我们正在将神经网络的参数传递给优化器。这是因为优化器需要知道要更新哪些参数。有一种更简单的方法，只需使用get_parameters(net)从tinygrad.nn.state中获取，它将返回神经网络中所有参数的列表。这里明确列出参数是为了清晰起见。

现在我们已经定义了网络、损失函数和优化器，唯一缺少的就是训练数据！ tinygrad中有几个数据集加载器，位于/extra/datasets。我们将使用MNIST数据集加载器。

from extra.datasets import fetch_mnist

现在我们已经准备好开始训练我们的神经网络了。我们将以64的批量大小进行1000步的训练。

我们使用with Tensor.train()在训练期间将内部标志Tensor.training设置为True。退出时，上下文管理器会将标志恢复为其先前的值。

X_train, Y_train, X_test, Y_test = fetch_mnist()

with Tensor.train():
  for step in range(1000):
    # random sample a batch
    samp = np.random.randint(0, X_train.shape[0], size=(64))
    batch = Tensor(X_train[samp], requires_grad=False)
    # get the corresponding labels
    labels = Tensor(Y_train[samp])

    # forward pass
    out = net(batch)

    # compute loss
    loss = sparse_categorical_crossentropy(out, labels)

    # zero gradients
    opt.zero_grad()

    # backward pass
    loss.backward()

    # update parameters
    opt.step()

    # calculate accuracy
    pred = out.argmax(axis=-1)
    acc = (pred == labels).mean()

    if step % 100 == 0:
      print(f"Step {step+1} | Loss: {loss.numpy()} | Accuracy: {acc.numpy()}")

评估¤

现在我们已经训练好了神经网络，可以在测试集上对其进行评估。我们将使用相同的64批次大小，并评估其中的1000个批次。

with Timing("Time: "):
  avg_acc = 0
  for step in range(1000):
    # random sample a batch
    samp = np.random.randint(0, X_test.shape[0], size=(64))
    batch = Tensor(X_test[samp], requires_grad=False)
    # get the corresponding labels
    labels = Y_test[samp]

    # forward pass
    out = net(batch)

    # calculate accuracy
    pred = out.argmax(axis=-1).numpy()
    avg_acc += (pred == labels).mean()
  print(f"Test Accuracy: {avg_acc / 1000}")

就这样了¤

强烈建议您查看examples/文件夹以获取更多使用tinygrad的示例。阅读tinygrad的源代码也是了解其工作原理的好方法。特别是test/中的测试用例，是学习如何使用不同操作及其语义的绝佳场所。在models/中还实现了一系列模型，您可以用作参考。

此外，欢迎在discord的#learn-tinygrad频道中提问。不要问能不能问，直接问就行！

附加功能¤

JIT¤

此外，通过使用JIT可以加速某些神经网络的计算。目前，这不支持输入尺寸变化的模型和非tinygrad操作。

要使用JIT，我们只需在神经网络的前向传播函数上添加一个装饰器，并确保输入和输出是已实现的张量。或者在本例中，我们将创建一个包装函数并对该包装函数进行装饰，以加速神经网络的评估。

from tinygrad import TinyJit

@TinyJit
def jit(x):
  return net(x).realize()

with Timing("Time: "):
  avg_acc = 0
  for step in range(1000):
    # random sample a batch
    samp = np.random.randint(0, X_test.shape[0], size=(64))
    batch = Tensor(X_test[samp], requires_grad=False)
    # get the corresponding labels
    labels = Y_test[samp]

    # forward pass with jit
    out = jit(batch)

    # calculate accuracy
    pred = out.argmax(axis=-1).numpy()
    avg_acc += (pred == labels).mean()
  print(f"Test Accuracy: {avg_acc / 1000}")

你会发现评估时间比以前快得多，而且加速器利用率也大幅提高。

保存和加载模型¤

tinygrad的标准权重格式是safetensors。这意味着您可以将任何使用safetensors格式的模型权重加载到tinygrad中。在state.py中有函数可以保存和加载这种格式的模型。

from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict

# first we need the state dict of our model
state_dict = get_state_dict(net)

# then we can just save it to a file
safe_save(state_dict, "model.safetensors")

# and load it back in
state_dict = safe_load("model.safetensors")
load_state_dict(net, state_dict)

models/ 文件夹中的许多模型都有一个 load_from_pretrained 方法，可以为您下载并加载权重。这些通常是 PyTorch 权重，意味着您需要安装 PyTorch 才能加载它们。

环境变量¤

存在一系列环境变量可以控制tinygrad的运行时行为。其中一些常见的包括DEBUG和不同的后端启用变量。

您可以在env_vars.md中找到完整列表及其描述。

可视化计算图¤

可以使用VIZ=1来可视化神经网络的计算图。