torch_frame.data.TensorFrame

class TensorFrame(feat_dict: dict[torch_frame.stype, TensorData], col_names_dict: dict[torch_frame.stype, list[str]], y: Tensor | None = None, num_rows: int | None = None)[source]

基础类：object

一个张量框架为每个表格列持有一个PyTorch张量。表格列按其语义类型组织stype（例如，分类、数值）并映射到紧凑的张量表示（例如，分类列中的字符串映射到{0, ..., num_categories - 1}的索引），并且可以通过feat_dict访问。例如，feat_dict[stype.numerical]存储所有数值特征的连接PyTorch张量，其中第一维和第二维分别表示原始数据框中的行和列。

TensorFrame 通过 float('NaN') 处理浮点张量中的缺失值，否则使用 -1。

col_names_dict 将 feat_dict 中的每一列映射到它们的原始列名。例如，col_names_dict[stype.numerical][i] 存储了 feat_dict[stype.numerical][:, i] 的列名。

此外，TensorFrame 可以在 y 中存储任何目标值。

import torch_frame

tf = torch_frame.TensorFrame(
    feat_dict = {
        # Two numerical columns:
        torch_frame.numerical: torch.randn(10, 2),
        # Three categorical columns:
        torch_frame.categorical: torch.randint(0, 5, (10, 3)),
    },
    col_names_dict = {
        torch_frame.numerical: ['num_1', 'num_2'],
        torch_frame.categorical: ['cat_1', 'cat_2', 'cat_3'],

    },
)

print(len(tf))
>>> 10

# Row-wise filtering:
tf = tf[torch.tensor([0, 2, 4, 6, 8])]
print(len(tf))
>>> 5

# Transfer tensor frame to the GPU:
tf = tf.to('cuda')

validate() → None[source]: 验证TensorFrame对象。

get_col_feat(col_name: str) → Union[Tensor, MultiNestedTensor, MultiEmbeddingTensor, dict[str, torch_frame.data.multi_nested_tensor.MultiNestedTensor]][来源]

获取给定列的特征。

Parameters:

col_name (str) – 输入列名。

Returns:

给定col_name的列特征。形状: 为[num_rows, 1, *]。

Return type:

TensorData

property stypes: list[torch_frame._stype.stype]: 返回feat_dict中stypes的规范顺序。

property num_cols: int: TensorFrame中的列数。

property num_rows: int: TensorFrame中的行数。

property device: torch.device | None: TensorFrame 的设备。

property is_empty: bool: 如果 TensorFrame 为空，则返回 True。