复杂运算

Reduce¤

求和 ¤

sum(
    axis: int | Sequence[int] | None = None,
    keepdim=False,
    dtype: DTypeLike | None = None,
) -> Tensor

返回张量沿指定轴或多个轴的元素之和。

你可以传入 axis 和 keepdim 关键字参数来控制计算最大值的轴以及是否保留被缩减的维度。

你可以传入dtype关键字参数来控制累加的数据类型。如果未指定，则根据输入张量的数据类型选择累加数据类型。

t = Tensor.arange(6).reshape(2, 3)
print(t.numpy())

[[0 1 2]
 [3 4 5]]

print(t.sum().numpy())

print(t.sum(axis=0).numpy())

[3 5 7]

print(t.sum(axis=1).numpy())

[ 3 12]

Source code in tinygrad/tensor.py

def sum(self, axis:int|Sequence[int]|None=None, keepdim=False, dtype:DTypeLike|None=None) -> Tensor:
  """
  Returns the sum of the elements of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  You can pass in `dtype` keyword argument to control the data type of the accumulation.
  If not specified, the accumulation data type is chosen based on the input tensor's data type.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(6).reshape(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.sum().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.sum(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.sum(axis=1).numpy())
  ```
  """
  ret = self.cast(sum_acc_dtype(self.dtype) if dtype is None else dtype)._reduce(Ops.ADD, axis, keepdim)
  return ret.cast(self.dtype) if dtype is None and self.dtype in (dtypes.float16, dtypes.bfloat16) else ret

乘积 ¤

prod(
    axis: int | Sequence[int] | None = None,
    keepdim=False,
    dtype: DTypeLike | None = None,
) -> Tensor

返回张量沿指定轴或多个轴元素的乘积。

你可以传入axis和keepdim关键字参数来控制计算最大值的轴以及是否保留缩减后的维度。

你可以传入dtype关键字参数来控制累加的数据类型。如果未指定，则根据输入张量的数据类型选择累加数据类型。

t = Tensor([-1, -2, -3, 1, 2, 3]).reshape(2, 3)
print(t.numpy())

[[-1 -2 -3]
 [ 1  2  3]]

print(t.prod().numpy())

-36

print(t.prod(axis=0).numpy())

[-1 -4 -9]

print(t.prod(axis=1).numpy())

[-6  6]

Source code in tinygrad/tensor.py

def prod(self, axis:int|Sequence[int]|None=None, keepdim=False, dtype:DTypeLike|None=None) -> Tensor:
  """
  Returns the product of the elements of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  You can pass in `dtype` keyword argument to control the data type of the accumulation.
  If not specified, the accumulation data type is chosen based on the input tensor's data type.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([-1, -2, -3, 1, 2, 3]).reshape(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.prod().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.prod(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.prod(axis=1).numpy())
  ```
  """
  return self.cast(dtype if dtype is not None else self.dtype)._reduce(Ops.MUL, axis, keepdim)

最大值 ¤

max(
    axis: int | Sequence[int] | None = None, keepdim=False
) -> Tensor

返回张量沿指定轴或多个轴的最大值。

你可以传入axis和keepdim关键字参数来控制计算最大值的轴以及是否保留缩减后的维度。

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())

[[1 0 2]
 [5 4 3]]

print(t.max().numpy())

print(t.max(axis=0).numpy())

[5 4 3]

print(t.max(axis=1, keepdim=True).numpy())

[[2]
 [5]]

Source code in tinygrad/tensor.py

def max(self, axis:int|Sequence[int]|None=None, keepdim=False) -> Tensor:
  """
  Returns the maximum value of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max(axis=1, keepdim=True).numpy())
  ```
  """
  return self._reduce(Ops.MAX, axis, keepdim)

最小值 ¤

min(
    axis: int | Sequence[int] | None = None, keepdim=False
) -> Tensor

返回张量沿指定轴或多个轴的最小值。

你可以传入 axis 和 keepdim 关键字参数来控制计算最小值的轴以及是否保留被缩减的维度。

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())

[[1 0 2]
 [5 4 3]]

print(t.min().numpy())

print(t.min(axis=0).numpy())

[1 0 2]

print(t.min(axis=1, keepdim=True).numpy())

[[0]
 [3]]

Source code in tinygrad/tensor.py

def min(self, axis:int|Sequence[int]|None=None, keepdim=False) -> Tensor:
  """
  Returns the minimum value of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the minimum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.min().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.min(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.min(axis=1, keepdim=True).numpy())
  ```
  """
  return self._inverse().max(axis=axis, keepdim=keepdim)._inverse()

任意 ¤

any(
    axis: int | Sequence[int] | None = None, keepdim=False
) -> Tensor

测试在指定轴或轴上是否有任何元素求值为True。

你可以传入 axis 和 keepdim 关键字参数来控制归约轴以及是否保留被归约的维度。

t = Tensor([[True, True], [True, False], [False, False]])
print(t.numpy())

[[ True  True]
 [ True False]
 [False False]]

print(t.any().numpy())

True

print(t.any(axis=0).numpy())

[ True  True]

print(t.any(axis=1, keepdim=True).numpy())

[[ True]
 [ True]
 [False]]

Source code in tinygrad/tensor.py

def any(self, axis:int|Sequence[int]|None=None, keepdim=False) -> Tensor:
  """
  Tests if any element evaluates to `True` along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the reduce axis and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[True, True], [True, False], [False, False]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.any().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.any(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.any(axis=1, keepdim=True).numpy())
  ```
  """
  return self.bool().max(axis, keepdim)

全部 ¤

all(
    axis: int | Sequence[int] | None = None, keepdim=False
) -> Tensor

测试是否所有元素沿指定轴或轴评估为True。

你可以传入 axis 和 keepdim 关键字参数来控制归约轴以及是否保留被归约的维度。

t = Tensor([[True, True], [True, False], [False, False]])
print(t.numpy())

[[ True  True]
 [ True False]
 [False False]]

print(t.all().numpy())

False

print(t.all(axis=0).numpy())

[False False]

print(t.all(axis=1, keepdim=True).numpy())

[[ True]
 [False]
 [False]]

Source code in tinygrad/tensor.py

def all(self, axis:int|Sequence[int]|None=None, keepdim=False) -> Tensor:
  """
  Tests if all element evaluates to `True` along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the reduce axis and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[True, True], [True, False], [False, False]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.all().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.all(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.all(axis=1, keepdim=True).numpy())
  ```
  """
  return self.logical_not().any(axis, keepdim).logical_not()

isclose ¤

isclose(
    other: Tensor,
    rtol: float = 1e-05,
    atol: float = 1e-08,
    equal_nan=False,
) -> Tensor

返回一个新的张量，其中包含元素级比较结果，判断是否在容差范围内接近other。

rtol 和 atol 关键字参数控制比较的相对容差和绝对容差。

默认情况下，两个NaN值彼此不接近。如果equal_nan为True，则两个NaN值会被视为接近。

print(Tensor([1e-7, 1e-8, 1e-9, float('nan')]).isclose(Tensor([0.0, 0.0, 0.0, float('nan')])).numpy())

[False  True  True False]

print(Tensor([float('nan')]).isclose(Tensor([float('nan')]), equal_nan=True).numpy())

[ True]

Source code in tinygrad/tensor.py

def isclose(self, other:Tensor, rtol:float=1e-05, atol:float=1e-08, equal_nan=False) -> Tensor:
  """
  Returns a new tensor with element-wise comparison of closeness to `other` within a tolerance.

  The `rtol` and `atol` keyword arguments control the relative and absolute tolerance of the comparison.

  By default, two `NaN` values are not close to each other. If `equal_nan` is `True`, two `NaN` values are considered close.

  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor([1e-7, 1e-8, 1e-9, float('nan')]).isclose(Tensor([0.0, 0.0, 0.0, float('nan')])).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor([float('nan')]).isclose(Tensor([float('nan')]), equal_nan=True).numpy())
  ```
  """
  is_finite_close = self.isfinite() & other.isfinite() & ((self - other).abs() <= atol + rtol * other.abs())
  is_infinite_close = (self.isinf() | other.isinf()) & (self == other)
  is_nan_close = (self.isnan() & other.isnan()) & equal_nan
  return is_finite_close | is_infinite_close | is_nan_close

平均值 ¤

mean(
    axis: int | Sequence[int] | None = None, keepdim=False
) -> Tensor

返回张量沿指定轴或轴的平均值。

你可以传入axis和keepdim关键字参数来控制计算均值的轴以及是否保留被缩减的维度。

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())

[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]

print(t.mean().numpy())

2.5907671

print(t.mean(axis=0).numpy())

[2.6623 2.4031 2.707 ]

print(t.mean(axis=1).numpy())

[2.833  2.3485]

Source code in tinygrad/tensor.py

def mean(self, axis:int|Sequence[int]|None=None, keepdim=False) -> Tensor:
  """
  Returns the mean value of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the mean is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.mean().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.mean(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.mean(axis=1).numpy())
  ```
  """
  output_dtype = self.dtype if dtypes.is_float(self.dtype) else dtypes.float32
  numerator = self.cast(sum_acc_dtype(self.dtype)).sum(axis=axis, keepdim=keepdim)
  return numerator.div(prod([cast(int, si) for si, so in zip(self.shape, self.sum(axis=axis, keepdim=True).shape) if resolve(si != so)])) \
    .cast(output_dtype)

变量 ¤

var(
    axis: int | Sequence[int] | None = None,
    keepdim=False,
    correction=1,
) -> Tensor

返回张量沿指定轴或轴方向的方差。

你可以传入axis、keepdim和correction关键字参数来控制计算方差的轴方向、是否保留缩减后的维度以及是否应用贝塞尔校正。

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())

[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]

print(t.var().numpy())

0.109925404

print(t.var(axis=0).numpy())

[0.2134 0.2189 0.0096]

print(t.var(axis=1).numpy())

[0.0187 0.08  ]

Source code in tinygrad/tensor.py

def var(self, axis:int|Sequence[int]|None=None, keepdim=False, correction=1) -> Tensor:
  """
  Returns the variance of the tensor along the specified axis or axes.

  You can pass in `axis`, `keepdim`, and `correction` keyword arguments to control the axis along
  which the variance is computed, whether the reduced dimensions are retained, and the Bessel's correction applied.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.var().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.var(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.var(axis=1).numpy())
  ```
  """
  squares = (self - self.mean(axis=axis, keepdim=True)).square()
  n = prod([si for si, so in zip(self.shape, squares.sum(axis=axis, keepdim=True).shape) if resolve(si != so)])
  return squares.sum(axis=axis, keepdim=keepdim).div(smax([0, n-correction]))

var_mean ¤

var_mean(
    axis: int | Sequence[int] | None = None,
    keepdim=False,
    correction=1,
) -> tuple[Tensor, Tensor]

计算指定维度dim上的方差和均值。这是对Tensor.var和Tensor.mean的语法糖封装，以匹配torch.var_mean的功能。

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())

[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]

var, mean = t.var_mean()
print(var.numpy(), mean.numpy())

0.109925404 2.5907671

Source code in tinygrad/tensor.py

def var_mean(self, axis:int|Sequence[int]|None=None, keepdim=False, correction=1) -> tuple[Tensor, Tensor]:
  """
  Calculates the variance and mean over the dimensions specified by dim.
  Syntactic sugar around `Tensor.var` and `Tensor.mean` to match `torch.var_mean`.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  var, mean = t.var_mean()
  print(var.numpy(), mean.numpy())
  ```
  """
  return self.var(axis, keepdim, correction), self.mean(axis, keepdim)

标准 ¤

std(
    axis: int | Sequence[int] | None = None,
    keepdim=False,
    correction=1,
) -> Tensor

返回张量沿指定轴或轴的标准差。

你可以传入axis、keepdim和correction关键字参数来控制计算标准差时所沿的轴、是否保留缩减后的维度以及是否应用贝塞尔校正。

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())

[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]

print(t.std().numpy())

0.33155

print(t.std(axis=0).numpy())

[0.462  0.4679 0.0981]

print(t.std(axis=1).numpy())

[0.1367 0.2829]

Source code in tinygrad/tensor.py

def std(self, axis:int|Sequence[int]|None=None, keepdim=False, correction=1) -> Tensor:
  """
  Returns the standard deviation of the tensor along the specified axis or axes.

  You can pass in `axis`, `keepdim`, and `correction` keyword arguments to control the axis along
  which the standard deviation is computed, whether the reduced dimensions are retained, and the Bessel's correction applied.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.std().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.std(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.std(axis=1).numpy())
  ```
  """
  return self.var(axis, keepdim, correction).sqrt()

标准差均值 ¤

std_mean(
    axis: int | Sequence[int] | None = None,
    keepdim=False,
    correction=1,
) -> tuple[Tensor, Tensor]

计算指定维度dim上的标准差和平均值。这是对Tensor.std和Tensor.mean的语法糖封装，以匹配torch.std_mean的功能。

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())

[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]

std, mean = t.std_mean()
print(std.numpy(), mean.numpy())

0.33155 2.5907671

Source code in tinygrad/tensor.py

def std_mean(self, axis:int|Sequence[int]|None=None, keepdim=False, correction=1) -> tuple[Tensor, Tensor]:
  """
  Calculates the standard deviation and mean over the dimensions specified by dim.
  Syntactic sugar around `Tensor.std` and `Tensor.mean` to match `torch.std_mean`.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  std, mean = t.std_mean()
  print(std.numpy(), mean.numpy())
  ```
  """
  return self.std(axis, keepdim, correction), self.mean(axis, keepdim)

softmax ¤

softmax(axis=-1, dtype: DTypeLike | None = None) -> Tensor

将softmax函数应用于张量沿指定轴。

将张量的元素重新缩放，使其位于[0, 1]范围内且总和为1。

你可以传入axis关键字参数来控制softmax计算所沿的轴。

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())

[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]

print(t.softmax

Source code in tinygrad/tensor.py

def softmax(self, axis=-1, dtype:DTypeLike|None=None) -> Tensor:
  """
  Applies the softmax function to the tensor along the specified axis.

  Rescales the elements of the tensor such that they lie in the range [0, 1] and sum to 1.

  You can pass in the `axis` keyword argument to control the axis along which the softmax is computed.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.softmax().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.softmax(axis=0).numpy())
  ```
  """
  if getenv("SINGLE_KERNEL_SOFTMAX"):
    _, e, ss = self.contiguous()._softmax(axis, dtype)
    return e.div(ss).fuse()
  _, e, ss = self._softmax(axis, dtype)
  return e.div(ss)

log_softmax ¤

log_softmax(
    axis=-1, dtype: DTypeLike | None = None
) -> Tensor

将log-softmax函数应用于张量沿指定轴。

log-softmax函数是log空间中softmax函数在数值计算上更稳定的替代方案。

你可以传入axis关键字参数来控制计算log-softmax的轴。

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())

[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]

print(t.log_softmax().numpy())

[[-0.8127 -1.3228 -1.238 ]
 [-1.2297 -1.7564 -0.6256]]

print(t.log_softmax(axis=0).numpy())

[[-0.2396 -0.2361 -0.564 ]
 [-1.5463 -1.5594 -0.8414]]

Source code in tinygrad/tensor.py

def log_softmax(self, axis=-1, dtype:DTypeLike|None=None) -> Tensor:
  """
  Applies the log-softmax function to the tensor along the specified axis.

  The log-softmax function is a numerically stable alternative to the softmax function in log space.

  You can pass in the `axis` keyword argument to control the axis along which the log-softmax is computed.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.log_softmax().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.log_softmax(axis=0).numpy())
  ```
  """
  m, _, ss = self._softmax(axis, dtype)
  return m - ss.log()

logsumexp ¤

logsumexp(axis=None, keepdim=False) -> Tensor

计算张量沿指定轴或多个轴的对数求和指数。

log-sum-exp函数是一种数值稳定的方法，用于计算指数和对数的总和。

你可以传入axis和keepdim关键字参数来控制计算log-sum-exp的轴以及是否保留被缩减的维度。

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())

[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]

print(t.logsumexp().numpy())

2.1347282

print(t.logsumexp(axis=0).numpy())

[1.2174 0.7039 1.1167]

print(t.logsumexp(axis=1).numpy())

[1.7906 0.9009]

Source code in tinygrad/tensor.py

def logsumexp(self, axis=None, keepdim=False) -> Tensor:
  """
  Computes the log-sum-exp of the tensor along the specified axis or axes.

  The log-sum-exp function is a numerically stable way to compute the logarithm of the sum of exponentials.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the log-sum-exp is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logsumexp().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logsumexp(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logsumexp(axis=1).numpy())
  ```
  """
  m = self.max(axis=axis, keepdim=True)
  return (self - m).exp().sum(axis=axis, keepdim=keepdim).log() + m.squeeze(axis)

logcumsumexp ¤

logcumsumexp(axis=0) -> Tensor

计算张量沿指定轴或多个轴的对数累积求和指数。

log-cumsum-exp函数是一种数值稳定的方法，用于计算指数累积和的对数。

你可以传入axis关键字参数来控制计算log-cum-sum-exp的轴。

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())

[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]

print(t.logcumsumexp().numpy())

[[0.9779 0.4678 0.5526]
 [1.2174 0.7039 1.1167]]

print(t.logcumsumexp(axis=0).numpy())

[[0.9779 0.4678 0.5526]
 [1.2174 0.7039 1.1167]]

print(t.logcumsumexp(axis=1).numpy())

[[ 0.9779  1.4481  1.7906]
 [-0.3288  0.1353  0.9009]]

Source code in tinygrad/tensor.py

def logcumsumexp(self, axis=0) -> Tensor:
  """
  Computes the log-cumsum-exp of the tensor along the specified axis or axes.

  The log-cumsum-exp function is a numerically stable way to compute the logarithm of the cumulative sum of exponentials.

  You can pass in the `axis` keyword argument to control the axis along which
  the log-cum-sum-exp is computed.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logcumsumexp().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logcumsumexp(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logcumsumexp(axis=1).numpy())
  ```
  """
  if self.ndim == 0: return self
  axis = self._resolve_dim(axis)
  x = self.transpose(axis, -1)
  last_dim_size = x.shape[-1]
  x_reshaped = x.reshape(-1, last_dim_size)
  x_cummax = x_reshaped.cummax(-1).unsqueeze(-1)
  x_expand = x_reshaped.unsqueeze(1).expand(*x_reshaped.shape, last_dim_size)
  mask = Tensor.ones(last_dim_size, last_dim_size, requires_grad=False, device=self.device).tril().unsqueeze(0)
  ret = ((x_expand - x_cummax).exp() * mask).sum(-1).log() + x_cummax.squeeze(-1)
  return ret.reshape(*x.shape).transpose(-1, axis)

argmax ¤

argmax(axis=None, keepdim=False) -> Tensor

返回张量沿指定轴的最大值的索引。

你可以传入axis和keepdim关键字参数来控制计算最大值的轴以及是否保留缩减后的维度。

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())

[[1 0 2]
 [5 4 3]]

print(t.argmax().numpy()) # 返回展平张量中最大值的索引。

print(t.argmax(axis=0).numpy()) # 返回沿轴0的最大值索引。

[1 1 1]

print(t.argmax(axis=1).numpy()) # 返回沿轴1的最大值索引。

[2 0]

Source code in tinygrad/tensor.py

def argmax(self, axis=None, keepdim=False) -> Tensor:
  """
  Returns the indices of the maximum value of the tensor along the specified axis.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmax().numpy()) # Returns the index of the maximum value in the flattened tensor.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmax(axis=0).numpy()) # Returns the indices of the maximum values along axis 0.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmax(axis=1).numpy()) # Returns the indices of the maximum values along axis 1.
  ```
  """
  if axis is None: return self.flatten().argmax(0)
  axis = self._resolve_dim(axis)
  m = self == self.max(axis=axis, keepdim=True)
  idx = m * Tensor.arange(self.shape[axis],0,-1, requires_grad=False, device=self.device).reshape(self.shape[axis], *[1]*(self.ndim-axis-1))
  return (self.shape[axis]-idx.max(axis=axis, keepdim=keepdim)).cast(dtypes.int32)

argmin ¤

argmin(axis=None, keepdim=False) -> Tensor

返回张量沿指定轴的最小值的索引。

你可以传入 axis 和 keepdim 关键字参数来控制计算最小值的轴以及是否保留被缩减的维度。

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())

[[1 0 2]
 [5 4 3]]

print(t.argmin().numpy()) # 返回展平张量中最小值的索引。

print(t.argmin(axis=0).numpy()) # 返回沿轴0方向的最小值索引。

[0 0 0]

print(t.argmin(axis=1).numpy()) # 返回沿轴1方向的最小值索引。

[1 2]

Source code in tinygrad/tensor.py

def argmin(self, axis=None, keepdim=False) -> Tensor:
  """
  Returns the indices of the minimum value of the tensor along the specified axis.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the minimum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmin().numpy()) # Returns the index of the minimum value in the flattened tensor.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmin(axis=0).numpy()) # Returns the indices of the minimum values along axis 0.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmin(axis=1).numpy()) # Returns the indices of the minimum values along axis 1.
  ```
  """
  return self._inverse().argmax(axis=axis, keepdim=keepdim)

处理¤

avg_pool2d ¤

avg_pool2d(
    kernel_size: tuple[int, ...] = (2, 2),
    stride=None,
    dilation=1,
    padding: int | tuple[int, ...] = 0,
    ceil_mode=False,
    count_include_pad=True,
) -> Tensor

对张量应用平均池化。

该函数支持三种不同类型的 padding

int (单值): 对所有空间维度统一应用相同的填充值。
tuple[int, ...] (长度 = 空间维度数量): 为每个空间维度指定不同的填充值，格式为 (padding_height, padding_width, ...)。
tuple[int, ...] (长度 = 2 * 空间维度数): 以(padding_left, padding_right, padding_top, padding_bottom, ...)的形式为每个空间维度的每侧指定显式填充。

当ceil_mode设置为True时，输出形状将使用向上取整除法确定。当count_include_pad设置为False时，零填充将不会包含在平均计算中。

注意

与PyTorch不同，这个实现不仅限于2D池化，而是适用于任意数量的维度。

参见：https://paperswithcode.com/method/average-pooling

t = Tensor.arange(25).reshape(1, 1, 5, 5)
print(t.avg_pool2d().numpy())

[[[[ 3.  5.]
   [13. 15.]]]]

print(t.avg_pool2d(ceil_mode=True).numpy())

[[[[ 3.   5.   6.5]
   [13.  15.  16.5]
   [20.5 22.5 24. ]]]]

print(t.avg_pool2d(padding=1).numpy())

[[[[ 0.    0.75  1.75]
   [ 3.75  9.   11.  ]
   [ 8.75 19.   21.  ]]]]

print(t.avg_pool2d(padding=1, count_include_pad=False).numpy())

[[[[ 0.   1.5  3.5]
   [ 7.5  9.  11. ]
   [17.5 19.  21. ]]]]

Source code in tinygrad/tensor.py

def avg_pool2d(self, kernel_size:tuple[int, ...]=(2,2), stride=None, dilation=1, padding:int|tuple[int, ...]=0,
               ceil_mode=False, count_include_pad=True) -> Tensor:
  """
  Applies average pooling over a tensor.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  When `ceil_mode` is set to `True`, output shape will be determined using ceil division.
  When `count_include_pad` is set to `False`, zero padding will not be included in the averaging calculation.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d pooling and instead works for any number of dimensions.

  See: https://paperswithcode.com/method/average-pooling

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(25).reshape(1, 1, 5, 5)
  print(t.avg_pool2d().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.avg_pool2d(ceil_mode=True).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.avg_pool2d(padding=1).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.avg_pool2d(padding=1, count_include_pad=False).numpy())
  ```
  """
  axis = tuple(range(-len(k_ := make_tuple(kernel_size, 2)), 0))
  def pool(x:Tensor, padding_:Sequence[int]) -> Tensor: return x.pad(padding_)._pool(k_, stride if stride is not None else k_, dilation)
  reg_pads = self._resolve_pool_pads(padding, len(k_))
  ceil_pads = self._apply_ceil_mode(reg_pads, k_, stride if stride is not None else k_, dilation)
  if not count_include_pad:
    pads = ceil_pads if ceil_mode else reg_pads
    return pool(self, pads).sum(axis) / pool(self.ones_like(), pads).sum(axis)
  if not ceil_mode: return pool(self, reg_pads).mean(axis)
  return pool(self, ceil_pads).sum(axis) / pool(self.pad(reg_pads).ones_like(), tuple(cp-rp for cp,rp in zip(ceil_pads, reg_pads))).sum(axis)

max_pool2d ¤

max_pool2d(
    kernel_size: tuple[int, ...] = (2, 2),
    stride=None,
    dilation=1,
    padding: int | tuple[int, ...] = 0,
    ceil_mode=False,
    return_indices=False,
) -> Tensor | tuple[Tensor, Tensor]

对张量应用最大池化操作。

该函数支持三种不同类型的 padding

int (单值): 对所有空间维度应用相同的填充值。
tuple[int, ...] (长度 = 空间维度数量): 以(padding_height, padding_width, ...)的形式为每个空间维度指定不同的填充值。
tuple[int, ...] (长度 = 2 * 空间维度数): 以(padding_left, padding_right, padding_top, padding_bottom, ...)的形式为每个空间维度的每侧指定显式填充。

当ceil_mode设置为True时，输出形状将使用向上取整除法确定。当return_indices设置为True时，将返回最大值及其对应的索引。

注意

与PyTorch不同，这个实现不仅限于2D池化，而是适用于任意数量的维度。

参见：https://paperswithcode.com/method/max-pooling

t = Tensor.arange(25).reshape(1, 1, 5, 5)
print(t.max_pool2d().numpy())

[[[[ 6  8]
   [16 18]]]]

print(t.max_pool2d(ceil_mode=True).numpy())

[[[[ 6  8  9]
   [16 18 19]
   [21 23 24]]]]

print(t.max_pool2d(padding=1).numpy())

[[[[ 0  2  4]
   [10 12 14]
   [20 22 24]]]]

Source code in tinygrad/tensor.py

def max_pool2d(self, kernel_size:tuple[int, ...]=(2,2), stride=None, dilation=1, padding:int|tuple[int, ...]=0,
               ceil_mode=False, return_indices=False) -> Tensor | tuple[Tensor, Tensor]:
  """
  Applies max pooling over a tensor.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  When `ceil_mode` is set to `True`, output shape will be determined using ceil division.
  When `return_indices` is set to `True`, the argmax will be returned along with the max values.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d pooling and instead works for any number of dimensions.

  See: https://paperswithcode.com/method/max-pooling

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(25).reshape(1, 1, 5, 5)
  print(t.max_pool2d().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max_pool2d(ceil_mode=True).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max_pool2d(padding=1).numpy())
  ```
  """
  axis = tuple(range(-len(k_ := make_tuple(kernel_size, 2)), 0))
  pads = self._resolve_pool_pads(padding, len(k_))
  if ceil_mode: pads = self._apply_ceil_mode(pads, k_, stride if stride is not None else k_, dilation)
  pooled = self.pad(pads, value=dtypes.min(self.dtype))._pool(k_, stride if stride is not None else k_, dilation)
  if not return_indices: return pooled.max(axis)
  spatial_sz = math.prod(spatial_shape := self.shape[-len(k_):])
  idx = Tensor.arange(spatial_sz,0,-1, requires_grad=False, device=self.device).reshape(spatial_shape)
  m = pooled == pooled.max(axis, keepdim=True)
  idx = m * idx.pad(pads, value=dtypes.min(idx.dtype))._pool(k_, stride if stride is not None else k_, dilation)
  return pooled.max(axis), spatial_sz - idx.max(axis)

max_unpool2d ¤

max_unpool2d(
    indices: Tensor,
    kernel_size: tuple[int, ...] = (2, 2),
    stride=None,
    dilation=1,
    padding: int | tuple[int, ...] = 0,
    output_size=None,
)

使用来自argmax的索引执行max_pool2d的部分逆操作。

当提供output_size时，输出形状将明确匹配所给定的形状。

注意

与PyTorch不同，这个实现不仅限于2D池化，而是适用于任意数量的维度。

t = Tensor.arange(1, 17).reshape(1, 1, 4, 4)
print(t.numpy())

[[[[ 1  2  3  4]
   [ 5  6  7  8]
   [ 9 10 11 12]
   [13 14 15 16]]]]

output, indices = Tensor.max_pool2d(t, return_indices=True)
print(output.numpy())
print(indices.numpy())

[[[[ 6  8]
   [14 16]]]]
[[[[ 5  7]
   [13 15]]]]

print(Tensor.max_unpool2d(output, indices).numpy())

[[[[ 0  0  0  0]
   [ 0  6  0  8]
   [ 0  0  0  0]
   [ 0 14  0 16]]]]

Source code in tinygrad/tensor.py

def max_unpool2d(self, indices:Tensor, kernel_size:tuple[int, ...]=(2,2), stride=None, dilation=1, padding:int|tuple[int, ...]=0, output_size=None):
  """
  Performs a partial inverse of `max_pool2d` using the indices from the argmax.

  When `output_size` is provided, the output shape disambiguates to the provided shape.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d pooling and instead works for any number of dimensions.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(1, 17).reshape(1, 1, 4, 4)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  output, indices = Tensor.max_pool2d(t, return_indices=True)
  print(output.numpy())
  print(indices.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.max_unpool2d(output, indices).numpy())
  ```
  """
  bs,c,*spatial_shape = self.shape
  if output_size is None:
    k_,d_,s_ = (make_tuple(x, len(spatial_shape)) for x in (kernel_size, dilation, stride if stride is not None else kernel_size))
    p_ = _flat_to_grouped(self._resolve_pool_pads(padding, len(spatial_shape)))
    # https://arxiv.org/pdf/1603.07285 inverse of relationship 15 in section 5.1.
    output_size = tuple((i-1)*s - (pB+pA) + (d*(k-1)+1) for i,k,d,s,(pA,pB) in zip(spatial_shape,k_,d_,s_,p_))
  else: output_size = output_size[-len(spatial_shape):]
  ret = (indices.reshape(bs,c,1,-1)._one_hot_along_dim(prod(output_size), 2) * self.reshape(bs,c,1,-1)).sum(3)
  return ret.reshape(bs,c,*output_size)

卷积2d ¤

conv2d(
    weight: Tensor,
    bias: Tensor | None = None,
    groups=1,
    stride=1,
    dilation=1,
    padding: int | tuple[int, ...] = 0,
    dtype: DTypeLike | None = None,
) -> Tensor

对张量应用卷积操作，使用给定的weight权重和可选的bias偏置。

该函数支持三种不同类型的 padding

int (单值): 对所有空间维度统一应用相同的填充值。
tuple[int, ...] (长度 = 空间维度数): 以(padding_height, padding_width, ...)的形式为每个空间维度指定不同的填充值。
tuple[int, ...] (长度 = 2 * 空间维度数): 以(padding_left, padding_right, padding_top, padding_bottom, ...)的形式为每个空间维度的每侧指定显式填充。

注意

与PyTorch不同，这个实现不仅限于2D卷积，而是适用于任意维度的卷积。

See: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

t = Tensor.arange(9).reshape(1, 1, 3, 3)
w = Tensor.ones(1, 1, 2, 2)
print(t.conv2d(w).numpy())

[[[[ 8. 12.]
   [20. 24.]]]]

Source code in tinygrad/tensor.py

def conv2d(self, weight:Tensor, bias:Tensor|None=None, groups=1, stride=1, dilation=1, padding:int|tuple[int, ...]=0,
           dtype:DTypeLike|None=None) -> Tensor:
  """
  Applies a convolution over a tensor with a given `weight` and optional `bias`.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d convolutions and instead works for any number of dimensions.

  See: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(9).reshape(1, 1, 3, 3)
  w = Tensor.ones(1, 1, 2, 2)
  print(t.conv2d(w).numpy())
  ```
  """
  if IMAGE: return self.image_conv2d(weight, bias, groups, stride, dilation, padding, dtype)
  (bs,cin_), (cout,cin), HW = self.shape[:2], weight.shape[:2], weight.shape[2:]
  padding_ = self._resolve_pool_pads(padding, len(HW))
  assert groups*cin == cin_ and len(self.shape) == len(weight.shape), f"Input Tensor shape {self.shape} does not match the shape of the weights {weight.shape}. ({groups*cin} vs. {cin_})"  # noqa: E501

  # conv2d is a pooling op (with padding)
  x = self.pad(padding_)._pool(HW, stride, dilation)   # (bs, groups*cin, oy, ox, H, W)
  rcout, oyx = cout//groups, x.shape[2:-len(HW)]
  if not all(x == 3 for x in HW) or stride != 1 or dilation != 1 or not WINO:
    # normal conv
    x = x.reshape(bs, groups, cin, 1, *oyx, *HW).expand(bs, groups, cin, rcout, *oyx, *HW).permute(0,1,3,*[4+i for i in range(len(oyx))],2,*[4+len(oyx)+i for i in range(len(HW))])  # noqa: E501

    # conv! broadcasted to (bs, groups, rcout, *oyx, cin, *HW)
    ret = (x * weight.reshape(1, groups, rcout, *[1] * len(oyx), cin, *HW)).sum([-1-i for i in range(1+len(oyx))], keepdim=True, dtype=dtype).reshape(bs, cout, *oyx)  # noqa: E501
    return ret if bias is None else ret.add(bias.reshape(1, -1, *[1] * len(HW)))

  HWI, HWO = (6,) * len(HW), (4,) * len(HW)  # F(4x4,3x3) winograd tiles
  winograd_G = [[1/4, 0, 0], [-1/6, -1/6, -1/6], [-1/6, 1/6, -1/6], [1/24, 1/12, 1/6], [1/24, -1/12, 1/6], [0, 0, 1]]
  winograd_Bt = [[4, 0, -5, 0, 1, 0], [0, -4, -4, 1, 1, 0], [0, 4, -4, -1, 1, 0], [0, -2, -1, 2, 1, 0], [0, 2, -1, -2, 1, 0], [0, 4, 0, -5, 0, 1]]
  winograd_At = [[1, 1, 1, 1, 1, 0], [0, 1, -1, 2, -2, 0], [0, 1, 1, 4, 4, 0], [0, 1, -1, 8, -8, 1]] # applying At in pre-order doubles compile time

  # todo: stride == dilation
  # use padding to round up to 4x4 output tiles
  # (bs, cin_, tyx, HWI)
  d = self.pad(sum([[padding_[i*2], padding_[i*2+1] + (-(dim + sum(padding_[i * 2:(i + 1) * 2]) - 2) % 4)] for i, dim in enumerate(self.shape[-len(HW):])], []))._pool(HWI, HWO)  # noqa: E501
  # move HW to the front: # (HWI, bs, cin_, tyx)
  d = d.permute(*range(len(d.shape)-len(HW),len(d.shape)), *range(len(d.shape)-len(HW)))
  tyx = d.shape[-len(HWI):]  # dim of tiling

  g = weight.permute(*range(len(weight.shape)-len(HW),len(weight.shape)), *range(len(weight.shape)-len(HW)))  # move HW to the front

  # compute 6x6 winograd tiles: GgGt, BtdB
  # (HWI, groups * rcout, cin) -> (HWI, bs=1, groups, rcout, cin, tyx=(1,1))
  gfactors = _apply_winograd_matrix(winograd_G, g, len(HW)).reshape(*HWI, 1, groups, rcout, cin, *([1]*len(tyx)))
  # (HWI, bs, cin_, tyx) -> (HWI, bs, groups, 1 ,cin, *tyx)
  dfactors = _apply_winograd_matrix(winograd_Bt, d, len(HW)).reshape(*HWI, bs, groups, 1, cin, *tyx)

  # matmul; sum across cin: (HWI, bs, groups, rcout, *tyx); then HWI -> HWO: (HWO, bs, groups, rcout, *tyx)
  ret = _apply_winograd_matrix(winograd_At, (gfactors * dfactors).sum(axis=-1-len(HW), dtype=dtype), len(HW))

  # interleave tyx and HWO: (bs, groups, rcout, oy, HO, ox, WO)
  ret = ret.permute([*range(len(HW), len(ret.shape)-len(HW)), *[i+o for i in range(len(HW)) for o in [len(ret.shape)-len(HW),0]]])
  # merge groups and rcout, tyx and HWO: (bs, groups, cout, *yx), shrink to final
  ret = ret.reshape(bs, cout, *[c * HWO[i] for i, c in enumerate(tyx)]).shrink(tuple((0, s) for s in [bs, cout, *oyx]))

  return (ret if bias is None else ret.add(bias.reshape(1, -1, *[1 for _ in range(len(HW))]))).contiguous().contiguous_backward()

转置二维卷积 ¤

conv_transpose2d(
    weight: Tensor,
    bias: Tensor | None = None,
    groups=1,
    stride=1,
    dilation=1,
    padding=0,
    output_padding=0,
) -> Tensor

在张量上应用带有给定weight和可选bias的转置卷积。

该函数支持三种不同类型的 padding

int (单值): 对所有空间维度应用相同的填充值。
tuple[int, ...] (长度 = 空间维度数): 以(padding_height, padding_width, ...)的形式为每个空间维度指定不同的填充值。
tuple[int, ...] (长度 = 2 * 空间维度数): 以(padding_left, padding_right, padding_top, padding_bottom, ...)的形式指定每个空间维度各边的显式填充。

注意

与PyTorch不同，该实现不仅限于2D转置卷积，而是适用于任意维度的卷积。

See: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

t = Tensor.arange(9).reshape(1, 1, 3, 3)
w = Tensor.ones(1, 1, 2, 2)
print(t.conv_transpose2d(w).numpy())

[[[[ 0.  1.  3.  2.]
   [ 3.  8. 12.  7.]
   [ 9. 20. 24. 13.]
   [ 6. 13. 15.  8.]]]]

Source code in tinygrad/tensor.py

def conv_transpose2d(self, weight:Tensor, bias:Tensor|None=None, groups=1, stride=1, dilation=1, padding=0, output_padding=0) -> Tensor:
  """
  Applies a transposed convolution over a tensor with a given `weight` and optional `bias`.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d transposed convolutions and instead works for any number of dimensions.

  See: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(9).reshape(1, 1, 3, 3)
  w = Tensor.ones(1, 1, 2, 2)
  print(t.conv_transpose2d(w).numpy())
  ```
  """
  x, w = self, weight.unflatten(0, (groups, -1)).transpose(1, 2).flip(*range(3, len(weight.shape)+1))
  HW = weight.shape[2:]
  padding = _flat_to_grouped(self._resolve_pool_pads(padding, len(HW)))
  stride, dilation, output_padding = [make_tuple(x, len(HW)) for x in (stride, dilation, output_padding)]
  if any(s>1 for s in stride):
    # handle strides: (k) -> reshape -> (k,1) -> pad -> (k,s) -> reshape -> (k*s) -> shrink (k-(s-1))
    x = x.reshape(None, None, *flatten((k,1) for k in x.shape[2:]))
    x = x.pad((None, None, *flatten((None,(0,s-1)) for s in stride)))
    x = x.reshape(None, None, *[k*s for k,s in zip(x.shape[2::2], stride)])
    x = x.shrink((None, None, *[(0,k-(s-1)) for k,s in zip(x.shape[2:], stride)]))
  padding = flatten((((k-1)*d-pB,(k-1)*d-pA+op) for k,d,(pB,pA),op in reversed(list(zip(HW, dilation, padding, output_padding)))))
  return x.conv2d(w.flatten(end_dim=1), groups=groups, bias=bias, dilation=dilation, padding=padding)

点积 ¤

dot(w: Tensor, dtype: DTypeLike | None = None) -> Tensor

在两个张量之间执行点积运算。如果w是一维的，则是对self的最后一个轴和w进行求和乘积。如果w是N维且N≥2，则是对self的最后一个轴和w的倒数第二个轴进行求和乘积。

你可以传入可选的 dtype 关键字参数来控制累加的数据类型。

a = Tensor([1, 2, 3])
b = Tensor([1, 1, 0])
print(a.dot(b).numpy())

a = Tensor([[1, 2], [3, 4]])
b = Tensor([[5, 6], [7, 8]])
print(a.dot(b).numpy())

[[19 22]
 [43 50]]

Source code in tinygrad/tensor.py

def dot(self, w:Tensor, dtype:DTypeLike|None=None) -> Tensor:

  """
  Performs dot product between two tensors.
  If `w` is 1-D, it's a sum product over the last axis of `self` and `w`.
  If `w` is N-D with N>=2, it's a sum product over the last axis of `self` and the second-to-last axis of `w`.

  You can pass in the optional `dtype` keyword argument to control the data type of the accumulation.

  ```python exec="true" source="above" session="tensor" result="python"
  a = Tensor([1, 2, 3])
  b = Tensor([1, 1, 0])
  print(a.dot(b).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  a = Tensor([[1, 2], [3, 4]])
  b = Tensor([[5, 6], [7, 8]])
  print(a.dot(b).numpy())
  ```
  """
  if IMAGE: return self.image_dot(w, dtype)
  x, dx, dw = self, self.ndim, w.ndim
  if not (dx > 0 and dw > 0): raise RuntimeError(f"both tensors need to be at least 1D, got {dx}D and {dw}D")
  if x.shape[-1] != w.shape[axis_w:=-min(w.ndim,2)]: raise RuntimeError(f"cannot dot {x.shape} and {w.shape}")
  x = x.reshape(*x.shape[0:-1], *[1]*min(dx-1, dw-1, 1), x.shape[-1])
  w = w.reshape(*w.shape[0:-2], *[1]*min(dx-1, dw-1, 1), *w.shape[axis_w:]).transpose(-1, axis_w)
  return (x*w).sum(-1, dtype=dtype).cast(least_upper_dtype(x.dtype, w.dtype) if dtype is None else dtype)

矩阵乘法 ¤

matmul(
    x: Tensor, reverse=False, dtype: DTypeLike | None = None
) -> Tensor

在两个张量之间执行矩阵乘法。

你可以传入reverse关键字参数来控制矩阵乘法的顺序。你可以传入可选的dtype关键字参数来控制累加的数据类型。

a = Tensor([[1, 2], [3, 4]])
b = Tensor([[5, 6], [7, 8]])
print(a.matmul(b).numpy())

[[19 22]
 [43 50]]

Source code in tinygrad/tensor.py

def matmul(self, x:Tensor, reverse=False, dtype:DTypeLike|None=None) -> Tensor:
  """
  Performs matrix multiplication between two tensors.

  You can pass in the `reverse` keyword argument to control the order of the matrix multiplication.
  You can pass in the optional `dtype` keyword argument to control the data type of the accumulation.

  ```python exec="true" source="above" session="tensor" result="python"
  a = Tensor([[1, 2], [3, 4]])
  b = Tensor([[5, 6], [7, 8]])
  print(a.matmul(b).numpy())
  ```
  """
  return x.dot(self, dtype=dtype) if reverse else self.dot(x, dtype=dtype)

einsum `staticmethod` ¤

einsum(
    formula: str,
    *operands: Tensor | Sequence[Tensor],
    dtype: DTypeLike | None = None
) -> Tensor

根据基于爱因斯坦求和约定的公式，对输入张量的元素乘积进行求和。

See: https://pytorch.org/docs/stable/generated/torch.einsum.html

x = Tensor([[1, 2], [3, 4]])
y = Tensor([[5, 6], [7, 8]])
print(Tensor.einsum("ij,ij->", x, y).numpy())

Source code in tinygrad/tensor.py

@staticmethod
def einsum(formula:str, *operands:Tensor|Sequence[Tensor], dtype:DTypeLike|None=None) -> Tensor:
  """
  Sums the product of the elements of the input tensors according to a formula based on the Einstein summation convention.

  See: https://pytorch.org/docs/stable/generated/torch.einsum.html

  ```python exec="true" source="above" session="tensor" result="python"
  x = Tensor([[1, 2], [3, 4]])
  y = Tensor([[5, 6], [7, 8]])
  print(Tensor.einsum("ij,ij->", x, y).numpy())
  ```
  """
  def parse_formula(formula:str, *operands:Tensor):
    if "..." in (formula := formula.replace(" ", "")):
      ell_chars, ell_longest = "".join(set(string.ascii_letters) - set(formula)), 0
      for i, inp in enumerate(filter(lambda x: "..." in x, inputs := formula.split("->")[0].split(","))):
        if (ell_count := max(operands[i].ndim, 1) - (len(inp) - len("..."))) > ell_longest: ell_longest = ell_count
        inputs[i] = inp.replace("...", ell_chars[-ell_count:])
      inputs_str, out_ellipse = ",".join(inputs), ell_chars[-ell_longest:]
      return (inputs_str, formula.split("->")[1].replace("...", out_ellipse)) if "->" in formula else \
        (inputs_str, out_ellipse + ''.join(sorted(c for c in inputs_str if inputs_str.count(c) == 1 and c.isalpha() and c not in out_ellipse)))
    return formula.split("->") if "->" in formula else (formula, ''.join(c for c in sorted(formula) if formula.count(c) == 1 and c.isalpha()))

  xs:tuple[Tensor, ...] = argfix(*operands)
  inputs_str, output = parse_formula(formula, *xs)
  inputs = inputs_str.split(",")
  assert len(xs) == len(inputs), f"number of inputs doesn't match number of operands in formula, expected {len(inputs)}, got {len(xs)}"

  # map the value of each letter in the formula
  letter_val = sorted(merge_dicts([dict(zip(letters, tensor.shape)) for letters, tensor in zip(inputs, xs)]).items())

  xs_:list[Tensor] = []
  lhs = [sorted(enumerate(s), key=lambda e:e[1]) for s in inputs]
  for x,(order,letters) in zip(xs, [list(zip(*l)) for l in lhs]):
    # permute to the sorted letter order, then reshape/expand to create dimensions for the missing letters
    xs_.append(x.permute(order).reshape([val if letter in letters else 1 for letter,val in letter_val]).expand([val for _,val in letter_val]))

  # ordinal encode the output alphabet
  rhs_order = argsort(argsort(list(output)))

  # sum over all axes that's not in the output, then permute to the output order
  return functools.reduce(lambda a,b:a*b, xs_) \
    .sum(axis=[axis for axis,(letter,_) in enumerate(letter_val) if letter not in output], dtype=dtype).permute(rhs_order)

累加和 ¤

cumsum(axis: int = 0) -> Tensor

计算张量沿指定axis的累积和。

t = Tensor.ones(2, 3)
print(t.numpy())

[[1. 1. 1.]
 [1. 1. 1.]]

print(t.cumsum(1).numpy())

[[1. 2. 3.]
 [1. 2. 3.]]

Source code in tinygrad/tensor.py

def cumsum(self, axis:int=0) -> Tensor:
  """
  Computes the cumulative sum of the tensor along the specified `axis`.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.ones(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.cumsum(1).numpy())
  ```
  """
  return self._split_cumalu(axis, Ops.ADD)

cummax ¤

cummax(axis: int = 0) -> Tensor

计算张量沿指定axis的累积最大值。

t = Tensor([0, 1, -1, 2, -2, 3, -3])
print(t.numpy())

[ 0  1 -1  2 -2  3 -3]

print(t.cummax(0).numpy())

[0 1 1 2 2 3 3]

Source code in tinygrad/tensor.py

def cummax(self, axis:int=0) -> Tensor:
  """
  Computes the cumulative max of the tensor along the specified `axis`.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([0, 1, -1, 2, -2, 3, -3])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.cummax(0).numpy())
  ```
  """
  return self._split_cumalu(axis, Ops.MAX)

triu ¤

triu(diagonal: int = 0) -> Tensor

返回张量的上三角部分，其他元素设置为0。

参数 diagonal 决定哪条对角线位于边界。diagonal = 0 表示主对角线。正值 diagonal 表示主对角线之上，负值 diagonal 表示主对角线之下。

t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(t.numpy())

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

print(t.triu(diagonal=0).numpy())

[[ 1  2  3  4]
 [ 0  6  7  8]
 [ 0  0 11 12]]

print(t.triu(diagonal=1).numpy())

[[ 0  2  3  4]
 [ 0  0  7  8]
 [ 0  0  0 12]]

print(t.triu(diagonal=-1).numpy())

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 0 10 11 12]]

Source code in tinygrad/tensor.py

def triu(self, diagonal:int=0) -> Tensor:
  """
  Returns the upper triangular part of the tensor, the other elements are set to 0.

  The argument `diagonal` determines which diagonal is on the boundary. `diagonal = 0` means the main diagonal.
  Positive `diagonal` means above the main diagonal, and negative `diagonal` means below the main diagonal.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.triu(diagonal=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.triu(diagonal=1).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.triu(diagonal=-1).numpy())
  ```
  """
  return Tensor._tri(self.shape[-2], self.shape[-1], diagonal=diagonal, device=self.device, dtype=dtypes.bool).where(self, 0).cast(self.dtype)

下三角矩阵 ¤

tril(diagonal: int = 0) -> Tensor

返回张量的下三角部分，其他元素被设置为0。

参数 diagonal 决定了哪条对角线位于边界上。diagonal = 0 表示主对角线。正值 diagonal 表示主对角线之上，负值 diagonal 表示主对角线之下。

t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(t.numpy())

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

print(t.tril(diagonal=0).numpy())

[[ 1  0  0  0]
 [ 5  6  0  0]
 [ 9 10 11  0]]

print(t.tril(diagonal=1).numpy())

[[ 1  2  0  0]
 [ 5  6  7  0]
 [ 9 10 11 12]]

print(t.tril(diagonal=-1).numpy())

[[ 0  0  0  0]
 [ 5  0  0  0]
 [ 9 10  0  0]]

Source code in tinygrad/tensor.py

def tril(self, diagonal:int=0) -> Tensor:
  """
  Returns the lower triangular part of the tensor, the other elements are set to 0.

  The argument `diagonal` determines which diagonal is on the boundary. `diagonal = 0` means the main diagonal.
  Positive `diagonal` means above the main diagonal, and negative `diagonal` means below the main diagonal.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.tril(diagonal=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.tril(diagonal=1).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.tril(diagonal=-1).numpy())
  ```
  """
  return Tensor._tri(self.shape[-2], self.shape[-1], diagonal=diagonal+1, device=self.device, dtype=dtypes.bool).where(0, self).cast(self.dtype)

插值 ¤

interpolate(
    size: tuple[int, ...],
    mode: str = "linear",
    align_corners: bool = False,
) -> Tensor

对输入size进行下采样或上采样，接受0到N个批次维度。

插值算法通过mode参数选择，目前仅支持linear、nearest和nearest-exact模式。要运行bilinear或trilinear插值，需要传入2D或3D尺寸参数。

t = Tensor([[1, 2, 3, 4], [21, 22, 23, 24], [41, 42, 43, 44]])
print(t.numpy())

[[ 1  2  3  4]
 [21 22 23 24]
 [41 42 43 44]]

print(t.interpolate(size=(2,3), mode="linear").numpy())

[[ 6  7  8]
 [36 37 38]]

Source code in tinygrad/tensor.py

def interpolate(self, size:tuple[int, ...], mode:str="linear", align_corners:bool=False) -> Tensor:
  """
  Downsamples or Upsamples to the input `size`, accepts 0 to N batch dimensions.

  The interpolation algorithm is selected with `mode` which currently only supports `linear`, `nearest` and `nearest-exact`.
  To run `bilinear` or `trilinear`, pass in a 2D or 3D size.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2, 3, 4], [21, 22, 23, 24], [41, 42, 43, 44]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.interpolate(size=(2,3), mode="linear").numpy())
  ```
  """
  assert isinstance(size, (tuple,list)) and all_int(size) and 0 < len(size) <= self.ndim, f"invalid {size=}"
  assert mode in ("linear", "nearest", "nearest-exact"), "only supports linear, nearest or nearest-exact interpolate"
  assert not (align_corners and mode != "linear"), "align_corners option can only be set with the interpolating mode linear"
  x, expand = self, list(self.shape)
  for i in range(-1,-len(size)-1,-1):
    scale = (self.shape[i] - int(align_corners)) / (size[i] - int(align_corners))
    arr, reshape = Tensor.arange(size[i], dtype=dtypes.float32, device=self.device), [1] * self.ndim
    reshape[i] = expand[i] = size[i]
    if mode == "linear":
      index = (scale*arr if align_corners else (scale*(arr+0.5))-0.5).clip(0, self.shape[i]-1)
      low, high, perc = [y.reshape(reshape).expand(expand) for y in (index.floor().int(), index.ceil().int(), index - index.floor())]
      x = x.gather(i, low).lerp(x.gather(i, high), perc)
    else:
      index = (scale*(arr+0.5) if mode=="nearest-exact" else scale*arr).cast(dtypes.int32).reshape(reshape).expand(expand)
      x = x.gather(i, index)
  return x.cast(self.dtype)

散点 ¤

scatter(
    dim: int,
    index: Tensor,
    src: Tensor | ConstType,
    reduce: Literal["multiply", "add"] | None = None,
) -> Tensor

将src值沿着由dim指定的轴进行分散。使用reduce应用add或multiply归约操作。

注意

要在张量 src 中使用 reduce 参数，请参阅 Tensor.scatter_reduce。

src = Tensor.arange(1, 11).reshape(2, 5)
print(src.numpy())

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]

index = Tensor([[0, 1, 2, 0]])
print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(0, index, src).numpy())

[[1 0 0 4 0]
 [0 2 0 0 0]
 [0 0 3 0 0]]

index = Tensor([[0, 1, 2], [0, 1, 4]])
print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(1, index, src).numpy())

[[1 2 3 0 0]
 [6 7 0 0 8]
 [0 0 0 0 0]]

print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='multiply').numpy())

[[2.   2.   2.46 2.  ]
 [2.   2.   2.   2.46]]

print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='add').numpy())

[[2.   2.   3.23 2.  ]
 [2.   2.   2.   3.23]]

Source code in tinygrad/tensor.py

def scatter(self, dim:int, index:Tensor, src:Tensor|ConstType, reduce:Literal['multiply', 'add']|None=None) -> Tensor:
  """
  Scatters `src` values along an axis specified by `dim`.
  Apply `add` or `multiply` reduction operation with `reduce`.

  NOTE: To use the `reduce` argument with a Tensor `src`, see `Tensor.scatter_reduce`.

  ```python exec="true" source="above" session="tensor" result="python"
  src = Tensor.arange(1, 11).reshape(2, 5)
  print(src.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  index = Tensor([[0, 1, 2, 0]])
  print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(0, index, src).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  index = Tensor([[0, 1, 2], [0, 1, 4]])
  print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(1, index, src).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='multiply').numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='add').numpy())
  ```
  """
  if reduce not in {None, "add", "multiply"}: raise TypeError(f"{reduce=} must be one of None, 'multiply', or 'add'")
  if reduce and isinstance(src, Tensor): raise TypeError("Tensor src is not supported with reduce arg. see scatter_reduce")
  if not isinstance(src, Tensor): src = index.full_like(src, device=self.device, dtype=self.dtype)
  if reduce == "add": return self.scatter_reduce(dim, index, src, "sum", include_self=True)
  if reduce == "multiply": return self.scatter_reduce(dim, index, src, "prod", include_self=True)
  src, mask = self._pre_scatter(dim, index, src)
  return _masked_setitem(self, src, mask, (-1,))

分散归约 ¤

scatter_reduce(
    dim: int,
    index: Tensor,
    src: Tensor,
    reduce: Literal["sum", "prod", "mean", "amax", "amin"],
    include_self: bool = True,
) -> Tensor

将src值沿着dim指定的轴进行分散。使用reduce应用"sum"、"prod"、"mean"、"amax"或"amin"归约操作。

设置 include_self=False 以排除 self 张量中的值参与归约计算。

src = Tensor.arange(1, 11).cast(dtypes.float).reshape(2, 5)
print(src.numpy())
index = Tensor([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
print(index.numpy())

[[ 1.  2.  3.  4.  5.]
 [ 6.  7.  8.  9. 10.]]
[[0 0 0 0 0]
 [0 0 0 0 0]]

print(Tensor.ones(1, 5, dtype=src.dtype).scatter_reduce(0, index, src, reduce='sum').numpy())

[[ 8. 10. 12. 14. 16.]]

print(Tensor.ones(1, 5, dtype=src.dtype).scatter_reduce(0, index, src, reduce='prod').numpy())

[[ 6. 14. 24. 36. 50.]]

print(Tensor.ones(1, 5, dtype=src.dtype).scatter_reduce(0, index, src, reduce='mean', include_self=False).numpy())

[[3.5 4.5 5.5 6.5 7.5]]

print(Tensor([[-10, 20, 0, 5, 10]], dtype=src.dtype).scatter_reduce(0, index, src, reduce='amax').numpy())

[[ 6. 20.  8.  9. 10.]]

print(Tensor([[-10, 20, 0, 5, 10]], dtype=src.dtype).scatter_reduce(0, index, src, reduce='amin').numpy())

[[-10.   2.   0.   4.   5.]]

Source code in tinygrad/tensor.py

def scatter_reduce(self, dim:int, index:Tensor, src:Tensor, reduce:Literal["sum", "prod", "mean", "amax", "amin"],
                   include_self:bool=True) -> Tensor:
  """
  Scatters `src` values along an axis specified by `dim`.
  Apply `"sum"`, `"prod"`, `"mean"`, `"amax"`, or `"amin"` reduction operations with `reduce`.

  Set `include_self=False` to exclude values in the `self` Tensor from the reduction.

  ```python exec="true" source="above" session="tensor" result="python"
  src = Tensor.arange(1, 11).cast(dtypes.float).reshape(2, 5)
  print(src.numpy())
  index = Tensor([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
  print(index.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.ones(1, 5, dtype=src.dtype).scatter_reduce(0, index, src, reduce='sum').numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.ones(1, 5, dtype=src.dtype).scatter_reduce(0, index, src, reduce='prod').numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.ones(1, 5, dtype=src.dtype).scatter_reduce(0, index, src, reduce='mean', include_self=False).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor([[-10, 20, 0, 5, 10]], dtype=src.dtype).scatter_reduce(0, index, src, reduce='amax').numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor([[-10, 20, 0, 5, 10]], dtype=src.dtype).scatter_reduce(0, index, src, reduce='amin').numpy())
  ```
  """
  src, mask = self._pre_scatter(dim, index, src)
  def _inv_mask(a:Tensor|ConstType, b:Tensor|ConstType) -> Tensor: return mask.any(-1).logical_not().where(a, b)
  # TODO: should not overwrite dtype here?
  if reduce == "sum": return mask.where(src, 0).sum(-1, dtype=self.dtype).add(self if include_self else _inv_mask(self, 0))
  if reduce == "prod": return mask.where(src, 1).prod(-1, dtype=self.dtype).mul(self if include_self else _inv_mask(self, 1))
  if reduce == "amax": return mask.where(src, m := dtypes.min(src.dtype)).max(-1).maximum(self if include_self else _inv_mask(self, m))
  if reduce == "amin": return mask.where(src, m := dtypes.max(src.dtype)).min(-1).minimum(self if include_self else _inv_mask(self, m))
  if reduce == "mean":
    count = mask.where(1, 0).sum(-1, dtype=self.dtype).add(1 if include_self else _inv_mask(1, 0))
    return mask.where(src, 0).sum(-1, dtype=self.dtype).add(self if include_self else _inv_mask(self, 0)).div(count)
  raise RuntimeError(f"{reduce=} must be one of 'sum', 'prod', 'mean', 'amax', 'amin'")

masked_select ¤

masked_select(mask)

根据布尔值mask从self中选择元素。

t = Tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
mask = Tensor([[True, False, True], [False, True, False], [False, False, True]])
print(t.numpy())
print(mask.numpy())

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[ True False  True]
 [False  True False]
 [False False  True]]

print(t.masked_select(mask).numpy())

[0 2 4 8]

Source code in tinygrad/tensor.py

def masked_select(self, mask):
  """
  Selects elements from `self` based on the boolean `mask`.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
  mask = Tensor([[True, False, True], [False, True, False], [False, False, True]])
  print(t.numpy())
  print(mask.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.masked_select(mask).numpy())
  ```
  """
  if not dtypes.is_bool(mask.dtype): raise RuntimeError(f"masked_select expects bool mask tensor, got {mask.dtype}")
  x, mask = self.flatten(), mask._broadcast_to(self.shape).flatten()
  mask_cumsum = mask.cumsum()
  counts = Tensor.zeros(mask_cumsum[-1].item(), dtype=dtypes.int32)
  idxs = counts.scatter(0, mask_cumsum, 1, reduce='add').cumsum()
  return x[idxs]

排序 ¤

sort(
    dim: int = -1, descending: bool = False
) -> tuple[Tensor, Tensor]

对张量沿指定维度执行双调排序。

等效元素的索引顺序始终保留。

参见：https://en.wikipedia.org/wiki/Bitonic_sorter

t = Tensor([[0.1, 0.5, 1.2, 3.4, 2.1], [2.2, 1.9, 0.3, 4.5, 0.8]])
print(t.numpy())

[[0.1 0.5 1.2 3.4 2.1]
 [2.2 1.9 0.3 4.5 0.8]]

sorted_values, indices = t.sort(dim=1, descending=True)
print(sorted_values.numpy())
print(indices.numpy())

[[3.4 2.1 1.2 0.5 0.1]
 [4.5 2.2 1.9 0.8 0.3]]
[[3 4 2 1 0]
 [3 0 1 4 2]]

Source code in tinygrad/tensor.py

def sort(self, dim:int=-1, descending:bool=False) -> tuple[Tensor, Tensor]:
  """
  Performs a bitonic sort on the tensor along the specified dimension.

  Order of indices for equivalent elements is always preserved.

  See: https://en.wikipedia.org/wiki/Bitonic_sorter

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[0.1, 0.5, 1.2, 3.4, 2.1], [2.2, 1.9, 0.3, 4.5, 0.8]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  sorted_values, indices = t.sort(dim=1, descending=True)
  print(sorted_values.numpy())
  print(indices.numpy())
  ```
  """
  x, dim = self, self._resolve_dim(dim)
  # pad to power of 2
  orig_len = x.shape[dim]
  n_stages = math.ceil(math.log2(orig_len))
  fill_value = dtypes.min(x.dtype) if descending else dtypes.max(x.dtype)
  pads = tuple((0, 2**n_stages - orig_len) if i == dim else None for i in range(x.ndim))
  x = x.pad(pads, value=fill_value).unflatten(dim, (2,)*n_stages)
  # https://en.wikipedia.org/wiki/Bitonic_sorter#/media/File:BitonicSort1.svg
  for stage in range(1, n_stages+1):
    if stage != n_stages:
      # flip so arrows of green boxes point the same way as blue boxes
      crossover_dim = dim + n_stages - stage - 1
      blue_box, green_box = x.split(1, crossover_dim)
      flip_dims = tuple(-i for i in range(1, stage+1+(self.ndim-dim)))
      x = (blue_box.cat(green_box.flip(flip_dims), dim=crossover_dim)).contiguous()
    for substage in range(stage-1, -1, -1):
      partner_dim = dim + n_stages - substage - 1
      x_top, x_bottom = x.split(1, partner_dim)
      x_larger, x_smaller = x_top.maximum(x_bottom), x_top.minimum(x_bottom)
      x = (x_larger.cat(x_smaller, dim=partner_dim) if descending else x_smaller.cat(x_larger, dim=partner_dim)).contiguous()
    if stage != n_stages:
      # flip wires back to undo the crossover
      blue_box, flipped_green_box = x.split(1, crossover_dim)
      x = blue_box.cat(flipped_green_box.flip(flip_dims), dim=crossover_dim)
  x = x.flatten(dim, dim+n_stages-1).shrink(tuple((0, orig_len) if i == dim else None for i in range(x.ndim)))
  # compute indices for sorted values
  idx = Tensor.arange(orig_len, requires_grad=False, device=self.device).reshape(tuple(orig_len if i == dim else 1 for i in range(x.ndim)))
  idx = idx.expand(x.shape)
  def compute_counts(t:Tensor): return ((idx.unsqueeze(dim) <= idx.unsqueeze(dim+1)) & (t.unsqueeze(dim) == t.unsqueeze(dim+1))).sum(dim+1)
  count_orig, count_sorted = compute_counts(self), compute_counts(x)
  cond = (self.unsqueeze(dim+1) == x.unsqueeze(dim)) & (count_orig.unsqueeze(dim+1) == count_sorted.unsqueeze(dim))
  idx = (cond * idx.unsqueeze(dim+1)).sum(dim)
  return x, idx

topk ¤

topk(
    k: int,
    dim: int = -1,
    largest: bool = True,
    sorted_: bool = True,
) -> tuple[Tensor, Tensor]

计算张量沿指定dim维度的前k个元素。

等效元素的索引顺序始终保留。

t = Tensor([[0.1, 0.5, 1.2, 3.4, 2.1], [2.2, 1.9, 0.3, 4.5, 0.8]])
print(t.numpy())

[[0.1 0.5 1.2 3.4 2.1]
 [2.2 1.9 0.3 4.5 0.8]]

topk_values, topk_indices = t.topk(2, dim=1)
print(topk_values.numpy())
print(topk_indices.numpy())

[[3.4 2.1]
 [4.5 2.2]]
[[3 4]
 [3 0]]

Source code in tinygrad/tensor.py

def topk(self, k:int, dim:int=-1, largest:bool=True, sorted_:bool=True) -> tuple[Tensor, Tensor]:
  """
  Computes the top-k elements of the tensor along the specified `dim`.

  Order of indices for equivalent elements is always preserved.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[0.1, 0.5, 1.2, 3.4, 2.1], [2.2, 1.9, 0.3, 4.5, 0.8]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  topk_values, topk_indices = t.topk(2, dim=1)
  print(topk_values.numpy())
  print(topk_indices.numpy())
  ```
  """
  if not sorted_: raise NotImplementedError("topk with sorted_=False is not supported")
  if k > self.shape[dim:=self._resolve_dim(dim)]: raise ValueError(f"selected index {k=} is out of range")
  x, idx = self.sort(dim, descending=largest)
  shrink_to_k = tuple((0, k) if i == dim else None for i in range(self.ndim))
  return x.shrink(shrink_to_k), idx.shrink(shrink_to_k)

神经网络（功能）¤

线性 ¤

linear(
    weight: Tensor, bias: Tensor | None = None
) -> Tensor

使用weight和bias对self进行线性变换。

See: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

t = Tensor([[1, 2], [3, 4]])
weight = Tensor([[1, 2], [3, 4]])
bias = Tensor([1, 2])
print(t.linear(weight, bias).numpy())

[[ 8 12]
 [16 24]]

Source code in tinygrad/tensor.py

def linear(self, weight:Tensor, bias:Tensor|None=None) -> Tensor:
  """
  Applies a linear transformation to `self` using `weight` and `bias`.

  See: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2], [3, 4]])
  weight = Tensor([[1, 2], [3, 4]])
  bias = Tensor([1, 2])
  print(t.linear(weight, bias).numpy())
  ```
  """
  x = self.mul(weight) if len(weight.shape) == 1 else self.dot(weight)
  return x.add(bias) if bias is not None else x

顺序 ¤

sequential(ll: list[Callable[[Tensor], Tensor]]) -> Tensor

将一系列函数应用于self，将每个函数的输出链接到下一个函数的输入。

t = Tensor([1, 2, 3])
print(t.sequential([lambda x: x * 2, lambda x: x + 1]).numpy())

[3 5 7]

Source code in tinygrad/tensor.py

def sequential(self, ll:list[Callable[[Tensor], Tensor]]) -> Tensor:
  """
  Applies a sequence of functions to `self` chaining the output of each function to the input of the next.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([1, 2, 3])
  print(t.sequential([lambda x: x * 2, lambda x: x + 1]).numpy())
  ```
  """
  return functools.reduce(lambda x,f: f(x), ll, self)

层归一化 ¤

layernorm(
    axis: int | tuple[int, ...] = -1, eps: float = 1e-05
) -> Tensor

对输入的小批量应用层归一化。

描述：https://paperswithcode.com/method/layer-normalization
论文: https://arxiv.org/abs/1607.06450v1

t = Tensor.randn(8, 10, 16) * 2 + 8
print(t.mean().item(), t.std().item())

7.923046112060547 2.0072739124298096

t = t.layernorm()
print(t.mean().item(), t.std().item())

-5.940565817041943e-09 1.0003893375396729

Source code in tinygrad/tensor.py

def layernorm(self, axis:int|tuple[int,...]=-1, eps:float=1e-5) -> Tensor:
  """
  Applies Layer Normalization over a mini-batch of inputs.

  - Described: https://paperswithcode.com/method/layer-normalization
  - Paper: https://arxiv.org/abs/1607.06450v1

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.randn(8, 10, 16) * 2 + 8
  print(t.mean().item(), t.std().item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = t.layernorm()
  print(t.mean().item(), t.std().item())
  ```
  """
  y = (self - self.mean(axis, keepdim=True))
  return y.mul((y*y).mean(axis, keepdim=True).add(eps).rsqrt())

批归一化 ¤

batchnorm(
    weight: Tensor | None,
    bias: Tensor | None,
    mean: Tensor,
    invstd: Tensor,
    axis: int | tuple[int, ...] = 1,
) -> Tensor

对小批量输入应用批量归一化。

描述：https://paperswithcode.com/method/batch-normalization
论文: https://arxiv.org/abs/1502.03167

t = Tensor.randn(8, 4, 16, 16) * 2 + 8
print(t.mean().item(), t.std().item())

8.030410766601562 1.9699476957321167

t = t.batchnorm(None, None, t.mean(axis=(0,2,3)), t.var(axis=(0,2,3)).add(1e-5).rsqrt())
print(t.mean().item(), t.std().item())

6.026898518030066e-07 0.9998166561126709

Source code in tinygrad/tensor.py

def batchnorm(self, weight:Tensor|None, bias:Tensor|None, mean:Tensor, invstd:Tensor, axis:int|tuple[int, ...]=1) -> Tensor:
  """
  Applies Batch Normalization over a mini-batch of inputs.

  - Described: https://paperswithcode.com/method/batch-normalization
  - Paper: https://arxiv.org/abs/1502.03167

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.randn(8, 4, 16, 16) * 2 + 8
  print(t.mean().item(), t.std().item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = t.batchnorm(None, None, t.mean(axis=(0,2,3)), t.var(axis=(0,2,3)).add(1e-5).rsqrt())
  print(t.mean().item(), t.std().item())
  ```
  """
  axis_ = argfix(axis)
  shape = tuple(s if ax in axis_ else 1 for ax, s in enumerate(self.shape))
  x = self - mean.reshape(shape)
  if weight is not None: x = x * weight.reshape(shape)
  ret = x.mul(invstd.reshape(shape) if len(invstd.shape) == len(axis_) else invstd)
  return (ret + bias.reshape(shape)) if bias is not None else ret

dropout ¤

dropout(p=0.5) -> Tensor

对self应用dropout。

注意

dropout仅在Tensor.training为True时应用。

描述：https://paperswithcode.com/method/dropout
Paper: https://jmlr.org/papers/v15/srivastava14a.html

Tensor.manual_seed(42)
t = Tensor.randn(2, 2)
with Tensor.train():
  print(t.dropout().numpy())

[[ 0.      2.17  ]
 [ 0.     -0.1682]]

Source code in tinygrad/tensor.py

def dropout(self, p=0.5) -> Tensor:
  """
  Applies dropout to `self`.

  NOTE: dropout is only applied when `Tensor.training` is `True`.

  - Described: https://paperswithcode.com/method/dropout
  - Paper: https://jmlr.org/papers/v15/srivastava14a.html

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 2)
  with Tensor.train():
    print(t.dropout().numpy())
  ```
  """
  if not Tensor.training or p == 0: return self
  return (Tensor.rand_like(self, requires_grad=False, dtype=dtypes.default_float, contiguous=False) >= p).contiguous().where(self, 0) / (1.0 - p)

one_hot ¤

one_hot(num_classes: int = -1) -> Tensor

将self转换为独热张量。

num_classes 默认为-1，这意味着num_classes将被推断为max(self) + 1。

t = Tensor([0, 1, 3, 3, 4])
print(t.one_hot(5).numpy())

[[1 0 0 0 0]
 [0 1 0 0 0]
 [0 0 0 1 0]
 [0 0 0 1 0]
 [0 0 0 0 1]]

Source code in tinygrad/tensor.py

def one_hot(self, num_classes:int=-1) -> Tensor:
  """
  Converts `self` to a one-hot tensor.

  `num_classes` defaults to -1, which means num_classes will be inferred as max(self) + 1.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([0, 1, 3, 3, 4])
  print(t.one_hot(5).numpy())
  ```
  """
  if not dtypes.is_int(self.dtype): raise RuntimeError(f"expect integer dtype, getting {self.dtype=}")
  if num_classes == -1: num_classes = (self.max()+1).item()
  return self[..., None]._one_hot_along_dim(num_classes).where(1, 0)

scaled_dot_product_attention ¤

scaled_dot_product_attention(
    key: Tensor,
    value: Tensor,
    attn_mask: Tensor | None = None,
    dropout_p: float = 0.0,
    is_causal: bool = False,
) -> Tensor

计算缩放点积注意力。 self是查询张量，key是键张量，value是值张量。

描述：https://paperswithcode.com/method/scaled
论文: https://arxiv.org/abs/1706.03762v7

q = Tensor.randn(2, 4, 8)
k = Tensor.randn(2, 4, 8)
v = Tensor.randn(2, 4, 8)
print(q.scaled_dot_product_attention(k, v).numpy())

[[[-0.1425 -0.1433 -0.3625  0.8853 -0.3129  1.0271 -0.0019  0.2445]
  [-0.7137  0.2617  1.1393  0.692   0.0461  0.1132  0.391  -0.3563]
  [ 0.4718  0.6791  0.8956  0.9387 -0.7198  0.753   0.5702  0.2661]
  [-1.0183  0.005   0.9208  0.6447  0.2658  0.0411  0.2314 -0.4636]]

 [[ 0.2928 -0.3364 -0.1937 -0.0755 -0.6196 -0.7339  0.8431 -0.3794]
  [ 0.5915  0.3565 -0.6987  0.241   0.2624 -0.1074 -0.3026 -0.3574]
  [ 0.3176 -0.4436 -0.3136 -0.5334 -0.5756 -0.851   0.9595 -0.4201]
  [ 0.4378  0.0234 -0.0984  0.4847 -0.3579 -0.3998  0.3781 -0.2338]]]

Source code in tinygrad/tensor.py

def scaled_dot_product_attention(self, key:Tensor, value:Tensor, attn_mask:Tensor|None=None, dropout_p:float=0.0, is_causal:bool=False) -> Tensor:
  """
  Computes scaled dot-product attention.
  `self` is the query tensor, `key` is the key tensor, and `value` is the value tensor.

  - Described: https://paperswithcode.com/method/scaled
  - Paper: https://arxiv.org/abs/1706.03762v7

  ```python exec="true" source="above" session="tensor" result="python"
  q = Tensor.randn(2, 4, 8)
  k = Tensor.randn(2, 4, 8)
  v = Tensor.randn(2, 4, 8)
  print(q.scaled_dot_product_attention(k, v).numpy())
  ```
  """
  # NOTE: it also works when `key` and `value` have symbolic shape.
  assert all_int(self.shape), f"does not support symbolic shape {self.shape}"
  qk = self.matmul(key.transpose(-2,-1), dtype=least_upper_dtype(self.dtype, key.dtype, dtypes.float32)) / math.sqrt(self.shape[-1])
  # handle attention mask
  if is_causal:
    if attn_mask is not None: raise RuntimeError("cannot set attn_mask when is_causal=True")
    attn_mask = qk.ones_like(requires_grad=False, device=self.device, dtype=dtypes.bool).tril()
  if attn_mask is not None:
    if attn_mask.dtype == dtypes.bool: attn_mask = attn_mask.where(0, -float("inf"))
    qk = qk + attn_mask
  return qk.cast(self.dtype).softmax(-1).dropout(dropout_p) @ value

二元交叉熵 ¤

binary_crossentropy(
    Y: Tensor, reduction: ReductionStr = "mean"
) -> Tensor

计算self和Y之间的二元交叉熵损失。

See: https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html

t = Tensor([0.1, 0.9, 0.2])
Y = Tensor([0, 1, 0])
print(t.binary_crossentropy(Y).item())

0.14462155103683472

Source code in tinygrad/tensor.py

def binary_crossentropy(self, Y:Tensor, reduction:ReductionStr="mean") -> Tensor:
  """
  Computes the binary cross-entropy loss between `self` and `Y`.

  See: https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([0.1, 0.9, 0.2])
  Y = Tensor([0, 1, 0])
  print(t.binary_crossentropy(Y).item())
  ```
  """
  return (-Y*self.log() - (1-Y)*(1-self).log())._do_reduction(reduction)

binary_crossentropy_logits ¤

binary_crossentropy_logits(
    Y: Tensor, reduction: ReductionStr = "mean"
) -> Tensor

计算self和Y之间的二元交叉熵损失，其中self是逻辑值。

See: https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

t = Tensor([-1, 2, -3])
Y = Tensor([0, 1, 0])
print(t.binary_crossentropy_logits(Y).item())

0.16292566061019897

Source code in tinygrad/tensor.py

def binary_crossentropy_logits(self, Y:Tensor, reduction:ReductionStr="mean") -> Tensor:
  """
  Computes the binary cross-entropy loss between `self` and `Y` where `self` is logits.

  See: https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([-1, 2, -3])
  Y = Tensor([0, 1, 0])
  print(t.binary_crossentropy_logits(Y).item())
  ```
  """
  return (self.maximum(0) - Y * self + (1 + self.abs().neg().exp()).log())._do_reduction(reduction)

稀疏分类交叉熵 ¤

sparse_categorical_crossentropy(
    Y: Tensor,
    ignore_index: int = -1,
    label_smoothing=0.0,
    reduction: ReductionStr = "mean",
) -> Tensor

计算self和Y之间的稀疏分类交叉熵损失。

注意

self 是逻辑值而 Y 是目标标签。注意：与PyTorch不同，此函数期望类别轴为-1

See: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.sparse_categorical_crossentropy(Y).item())

0.09391524642705917

Source code in tinygrad/tensor.py

def sparse_categorical_crossentropy(self, Y:Tensor, ignore_index:int=-1, label_smoothing=0.0, reduction:ReductionStr="mean") -> Tensor:
  """
  Computes the sparse categorical cross-entropy loss between `self` and `Y`.

  NOTE: `self` is logits and `Y` is the target labels.
  NOTE: unlike PyTorch, this function expects the class axis to be -1

  See: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.sparse_categorical_crossentropy(Y).item())
  ```
  """
  assert 0.0 <= label_smoothing <= 1.0, "label_smoothing must be in [0.0, 1.0]"
  assert reduction in ("mean", "sum", "none"), "reduction must be one of ['mean', 'sum', 'none']"
  log_probs, loss_mask = self.log_softmax(), (Y != ignore_index) if ignore_index != -1 else Y.ones_like(dtype=dtypes.bool)
  y_counted = Y.to(self.device).flatten().reshape(-1, 1)._one_hot_along_dim(self.shape[-1])
  y = (y_counted * loss_mask.reshape(-1, 1)).reshape(*Y.shape, self.shape[-1])
  smoothing = label_smoothing * (log_probs.mean(-1) * loss_mask)
  unreduced = ((1 - label_smoothing) * (log_probs * y).sum(-1) + smoothing)
  # NOTE: because of ignore_index, we can't use Tensor.mean (so can't use `_do_reduction` here)
  return -(unreduced.sum() / loss_mask.sum() if reduction == "mean" else (unreduced.sum() if reduction == "sum" else unreduced))

交叉熵 ¤

cross_entropy(
    Y: Tensor,
    reduction: ReductionStr = "mean",
    label_smoothing: float = 0.0,
) -> Tensor

计算输入logits与目标之间的交叉熵损失。

注意

self 是逻辑值，Y 是目标标签或类别概率。

See: https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.cross_entropy(Y).item())

0.09391524642705917

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.cross_entropy(Y, reduction='none').numpy())

[0.055  0.1328]

Source code in tinygrad/tensor.py

def cross_entropy(self, Y:Tensor, reduction:ReductionStr="mean", label_smoothing:float=0.0) -> Tensor:
  """
  Compute the cross entropy loss between input logits and target.

  NOTE: `self` are logits and `Y` are the target labels or class probabilities.

  See: https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.cross_entropy(Y).item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.cross_entropy(Y, reduction='none').numpy())
  ```
  """
  assert 0.0 <= label_smoothing <= 1.0, "label_smoothing must be in [0.0, 1.0]"
  Y = Y.one_hot(num_classes=cast(int, self.shape[1])) if Y.ndim < 2 else Y
  Y = (1 - label_smoothing)*Y + label_smoothing / cast(int, Y.shape[1])
  ret = -self.log_softmax(axis=1).mul(Y).sum(axis=1)
  return ret._do_reduction(reduction)

nll_loss ¤

nll_loss(
    Y: Tensor,
    weight: Tensor | None = None,
    ignore_index: int | None = None,
    reduction: ReductionStr = "mean",
) -> Tensor

计算对数概率与目标标签之间的负对数似然损失。

注意

self 是log概率，Y 是Y标签或类别概率。

See: https://pytorch.org/docs/stable/generated/torch.nn.functional.nll_loss.html

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.log_softmax().nll_loss(Y).item())

0.09391524642705917

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.log_softmax().nll_loss(Y, reduction='none').numpy())

[0.055  0.1328]

Source code in tinygrad/tensor.py

def nll_loss(self, Y:Tensor, weight:Tensor|None=None, ignore_index:int|None=None, reduction:ReductionStr="mean") -> Tensor:
  """
  Compute the negative log likelihood loss between log-probabilities and target labels.

  NOTE: `self` is log-probabilities and `Y` is the Y labels or class probabilities.

  See: https://pytorch.org/docs/stable/generated/torch.nn.functional.nll_loss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.log_softmax().nll_loss(Y).item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.log_softmax().nll_loss(Y, reduction='none').numpy())
  ```
  """
  weight = Tensor.ones_like(Y, requires_grad=False) if weight is None else weight[Y]
  masked_weight = weight if ignore_index is None else weight * (Y != ignore_index)
  nll = -self.gather(1, Y.unsqueeze(1)).squeeze(1) * masked_weight
  return nll.sum() / masked_weight.sum() if reduction == "mean" else nll._do_reduction(reduction)

复杂运算

Reduce¤

求和 ¤

乘积 ¤

最大值 ¤

最小值 ¤

任意 ¤

全部 ¤

isclose ¤

平均值 ¤

变量 ¤

var_mean ¤

标准 ¤

标准差均值 ¤

softmax ¤

log_softmax ¤

logsumexp ¤

logcumsumexp ¤

argmax ¤

argmin ¤

处理¤

avg_pool2d ¤

max_pool2d ¤

max_unpool2d ¤

卷积2d ¤

转置二维卷积 ¤

点积 ¤

矩阵乘法 ¤

einsum staticmethod ¤

累加和 ¤

cummax ¤

triu ¤

下三角矩阵 ¤

插值 ¤

散点 ¤

分散归约 ¤

masked_select ¤

排序 ¤

topk ¤

神经网络（功能）¤

线性 ¤

顺序 ¤

层归一化 ¤

批归一化 ¤

dropout ¤

one_hot ¤

scaled_dot_product_attention ¤

二元交叉熵 ¤

binary_crossentropy_logits ¤

稀疏分类交叉熵 ¤

交叉熵 ¤

nll_loss ¤

einsum `staticmethod` ¤