torch_geometric.loader

`DataLoader`	一个数据加载器，它将来自`torch_geometric.data.Dataset`的数据对象合并到一个迷你批次中。
`NodeLoader`	一个数据加载器，使用通用的`BaseSampler`实现，从节点信息中执行小批量采样，该实现定义了一个`sample_from_nodes()`函数，并在提供的输入`data`对象上受支持。
`LinkLoader`	一个数据加载器，它从链接信息执行小批量采样，使用一个通用的`BaseSampler`实现，该实现定义了一个`sample_from_edges()`函数，并在提供的输入`data`对象上得到支持。
`NeighborLoader`	一个数据加载器，执行邻居采样，如"Inductive Representation Learning on Large Graphs"论文中介绍的。
`LinkNeighborLoader`	一个基于链接的数据加载器，作为基于节点的`torch_geometric.loader.NeighborLoader`的扩展。
`HGTLoader`	来自"Heterogeneous Graph Transformer"论文的异构图采样器。
`ClusterData`	将图数据对象聚类/分区为多个子图，这是受到"Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks"论文的启发。
`ClusterLoader`	数据加载器方案来自"Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks"论文，该方案将大规模图数据对象中的分区子图及其集群间链接合并，形成一个迷你批次。
`GraphSAINTSampler`	来自"GraphSAINT: 基于图采样的归纳学习方法"论文的GraphSAINT采样器基类。
`GraphSAINTNodeSampler`	GraphSAINT节点采样器类（参见`GraphSAINTSampler`）。
`GraphSAINTEdgeSampler`	GraphSAINT 边采样器类（参见 `GraphSAINTSampler`）。
`GraphSAINTRandomWalkSampler`	GraphSAINT随机游走采样器类（参见`GraphSAINTSampler`）。
`ShaDowKHopSampler`	来自"Decoupling the Depth and Scope of Graph Neural Networks"论文的ShaDow \(k\)-hop采样器。
`RandomNodeLoader`	一个数据加载器，它随机采样图中的节点并返回它们的诱导子图。
`ZipLoader`	一个加载器，通过从多个`NodeLoader`或`LinkLoader`实例中采样，返回数据对象的元组。
`DataListLoader`	一个数据加载器，它将来自`torch_geometric.data.dataset`的数据对象批量加载到一个Python列表中。
`DenseDataLoader`	一个数据加载器，它将来自`torch_geometric.data.dataset`的数据对象批量处理为`torch_geometric.data.Batch`对象，通过将所有属性堆叠到一个新的维度中。
`TemporalDataLoader`	一个数据加载器，它将`torch_geometric.data.TemporalData`的连续事件合并为一个小批量。
`NeighborSampler`	来自"Inductive Representation Learning on Large Graphs"论文的邻居采样器，它允许在大规模图上进行GNN的小批量训练，其中全批量训练不可行。
`ImbalancedSampler`	一个加权随机采样器，根据类别分布随机采样元素。
`DynamicBatchSampler`	动态地将样本添加到小批量中，直到达到最大大小（基于节点数或边数）。
`PrefetchLoader`	一个用于将`torch.utils.data.DataLoader`的数据从主机内存异步传输到设备内存的GPU预取器类。
`CachedLoader`	一个用于缓存小批量输出的加载器，例如，在`NeighborLoader`迭代期间获得的输出。
`AffinityMixin`	一个上下文管理器，用于为数据加载器工作进程启用CPU亲和性（仅在CPU设备上运行时使用）。
`RAGQueryLoader`	用于从远程后端进行RAG查询的加载器。
`RAGFeatureStore`	远程GNN RAG后端的特征存储模板。
`RAGGraphStore`	用于远程GNN RAG后端的图存储模板。

class DataLoader(dataset: Union[数据集, Sequence[BaseData], DatasetAdapter], batch_size: int = 1, shuffle: bool = False, follow_batch: Optional[List[str]] = None, exclude_keys: Optional[List[str]] = None, **kwargs)[source]

一个数据加载器，它将来自torch_geometric.data.Dataset的数据对象合并到一个迷你批次中。数据对象可以是Data类型或HeteroData类型。

Parameters:

dataset (Dataset) – 从中加载数据的数据集。
batch_size (int, optional) – 每批次加载的样本数量。 (默认: 1)
shuffle (bool, optional) – 如果设置为 True，数据将在每个周期重新洗牌。（默认值：False）
follow_batch (List[str], optional) – 为列表中的每个键创建分配批次向量。(默认值: None)
exclude_keys (List[str], optional) – 将排除列表中的每个键。（默认值：None）
**kwargs (可选) – torch.utils.data.DataLoader 的额外参数。

class NodeLoader(data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]], node_sampler: BaseSampler, input_nodes: Union[Tensor, None, str, Tuple[str, Optional[Tensor]]] = None, input_time: Optional[Tensor] = None, transform: Optional[Callable] = None, transform_sampler_output: Optional[Callable] = None, filter_per_worker: Optional[bool] = None, custom_cls: Optional[HeteroData] = None, input_id: Optional[Tensor] = None, **kwargs)[source]

一个数据加载器，从节点信息执行小批量采样，使用一个通用的BaseSampler实现，该实现定义了一个sample_from_nodes()函数，并且在提供的输入data对象上受支持。

Parameters:

data (Any) – 一个 Data, HeteroData, 或 (FeatureStore, GraphStore) 数据对象。
node_sampler (torch_geometric.sampler.BaseSampler) – 用于此加载器的采样器实现。需要实现 sample_from_nodes()。采样器实现必须与输入的 data 对象兼容。
input_nodes (torch.Tensor 或 str 或 Tuple[str, torch.Tensor]) – 从这些种子节点的索引开始采样。需要以 torch.LongTensor 或 torch.BoolTensor 的形式给出。如果设置为 None，则所有节点都将被考虑。在异构图的情况下，需要以包含节点类型和节点索引的元组形式传递。（默认值：None）
input_time (torch.Tensor, optional) – 可选值，用于覆盖在input_nodes中给定的输入节点的时间戳。如果未设置，将使用time_attr中的时间戳作为默认值（如果存在）。要使此功能生效，需要设置time_attr。 (默认值: None)
transform (callable, optional) – 一个函数/转换，它接收一个采样的迷你批次并返回一个转换后的版本。 (默认: None)
transform_sampler_output (可调用的, 可选的) – 一个函数/转换它接收一个 torch_geometric.sampler.SamplerOutput 并返回一个转换后的版本。(默认: None)
filter_per_worker (bool, 可选) – 如果设置为 True，将在每个工作者的子进程中过滤返回的数据。如果设置为 False，将在主进程中过滤返回的数据。如果设置为 None，将根据数据是否部分存在于 GPU 上（filter_per_worker=True）或完全存在于 CPU 上（filter_per_worker=False）自动推断决策。设置此选项存在不同的权衡。具体来说，对于内存中的数据集，将此选项设置为 True 会将所有特征移动到共享内存中，这可能会导致打开的文件句柄过多。（默认值：None）
custom_cls (HeteroData, 可选) – 一个自定义的 HeteroData 类，用于在远程后端的情况下返回小批量数据。（默认值：None）
**kwargs (可选) – torch.utils.data.DataLoader 的额外参数，例如 batch_size, shuffle, drop_last 或 num_workers.

collate_fn(index: Union[Tensor, List[int]]) → Any[source]

从一批输入节点中采样一个子图。

Return type:: Any

filter_fn(out: Union[SamplerOutput, HeteroSamplerOutput]) → Union[Data, HeteroData][source]

将采样的节点与其对应的特征连接起来，返回生成的Data或HeteroData对象以供下游使用。

Return type:: Union[Data, HeteroData]

class LinkLoader(data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]], link_sampler: BaseSampler, edge_label_index: Union[Tensor, None, Tuple[str, str, str], Tuple[Tuple[str, str, str], Optional[Tensor]]] = None, edge_label: Optional[Tensor] = None, edge_label_time: Optional[Tensor] = None, neg_sampling: Optional[NegativeSampling] = None, neg_sampling_ratio: Optional[Union[int, float]] = None, transform: Optional[Callable] = None, transform_sampler_output: Optional[Callable] = None, filter_per_worker: Optional[bool] = None, custom_cls: Optional[HeteroData] = None, input_id: Optional[Tensor] = None, **kwargs)[source]

一个数据加载器，从链接信息执行小批量采样，使用通用的BaseSampler实现，该实现定义了一个sample_from_edges()函数，并在提供的输入data对象上受支持。

注意

负采样目前以近似方式实现，即负边可能包含假阴性。

Parameters:

data (Any) – A Data, HeteroData, or (FeatureStore, GraphStore) data object.
link_sampler (torch_geometric.sampler.BaseSampler) – 与此加载器一起使用的采样器实现。需要实现sample_from_edges()。采样器实现必须与输入的data对象兼容。
edge_label_index (Tensor 或 EdgeType 或 Tuple[EdgeType, Tensor]) – 边索引，包含源节点和目标节点以开始采样。如果设置为 None，则将考虑所有边。在异质图中，需要作为包含边类型和相应边索引的元组传递。 (默认值: None)
edge_label (Tensor, optional) – 从中开始采样的边索引的标签。必须与edge_label_index的长度相同。（默认值：None）
edge_label_time (Tensor, optional) – 从哪个时间戳开始采样的边索引的时间戳。必须与 edge_label_index 的长度相同。如果设置，将使用时间采样，以确保邻居满足时间约束， i.e.，邻居的时间戳早于输出边的时间戳。要使此功能生效，需要设置 time_attr。（默认值：None）
neg_sampling (NegativeSampling, optional) – 负采样配置。对于负采样模式 "binary"，可以通过返回的小批量数据中相应边类型的属性 edge_label_index 和 edge_label 访问样本。如果 edge_label 不存在，它将自动创建并代表一个二分类任务（0 = 负边，1 = 正边）。如果 edge_label 存在，它必须是一个从 0 到 num_classes - 1 的分类标签。在负采样之后，标签 0 代表负边，标签 1 到 num_classes 代表正边的标签。请注意，返回的标签类型为 torch.float 用于二分类（以便于使用 F.binary_cross_entropy()），类型为 torch.long 用于多分类（以便于使用 F.cross_entropy()）。对于负采样模式 "triplet"，可以通过返回的小批量数据中相应节点类型的属性 src_index、dst_pos_index 和 dst_neg_index 访问样本。 edge_label 在 "triplet" 负采样模式下需要为 None。如果设置为 None，则不应用负采样策略。（默认值：None）
neg_sampling_ratio (int 或 float, 可选) – 采样的负边与正边数量的比例。已弃用，推荐使用 neg_sampling 参数。 (默认值: None).
transform (callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default: None)
transform_sampler_output (callable, optional) – A function/transform that takes in a torch_geometric.sampler.SamplerOutput and returns a transformed version. (default: None)
filter_per_worker (bool, optional) – If set to True, will filter the returned data in each worker’s subprocess. If set to False, will filter the returned data in the main process. If set to None, will automatically infer the decision based on whether data partially lives on the GPU (filter_per_worker=True) or entirely on the CPU (filter_per_worker=False). There exists different trade-offs for setting this option. Specifically, setting this option to True for in-memory datasets will move all features to shared memory, which may result in too many open file handles. (default: None)
custom_cls (HeteroData, optional) – A custom HeteroData class to return for mini-batches in case of remote backends. (default: None)
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader, such as batch_size, shuffle, drop_last or num_workers.

collate_fn(index: Union[Tensor, List[int]]) → Any[source]

从一批输入边中采样一个子图。

Return type:: Any

filter_fn(out: Union[SamplerOutput, HeteroSamplerOutput]) → Union[Data, HeteroData][source]

Joins the sampled nodes with their corresponding features, returning the resulting Data or HeteroData object to be used downstream.

Return type:: Union[Data, HeteroData]

class NeighborLoader(data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]], num_neighbors: Union[List[int], Dict[Tuple[str, str, str], List[int]]], input_nodes: Union[Tensor, None, str, Tuple[str, Optional[Tensor]]] = None, input_time: Optional[Tensor] = None, replace: bool = False, subgraph_type: Union[SubgraphType, str] = 'directional', disjoint: bool = False, temporal_strategy: str = 'uniform', time_attr: Optional[str] = None, weight_attr: Optional[str] = None, transform: Optional[Callable] = None, transform_sampler_output: Optional[Callable] = None, is_sorted: bool = False, filter_per_worker: Optional[bool] = None, neighbor_sampler: Optional[NeighborSampler] = None, directed: bool = True, **kwargs)[source]

一个数据加载器，执行邻居采样，如“Inductive Representation Learning on Large Graphs”论文中介绍的。这个加载器允许在大规模图上进行GNN的小批量训练，其中全批量训练不可行。

更具体地说，num_neighbors 表示在每次迭代中为每个节点采样的邻居数量。 NeighborLoader 接受这个 num_neighbors 列表，并迭代地为参与迭代 i - 1 的每个节点采样 num_neighbors[i]。

采样的节点根据它们被采样的顺序进行排序。特别是，前batch_size个节点代表原始小批量节点的集合。

from torch_geometric.datasets import Planetoid
from torch_geometric.loader import NeighborLoader

data = Planetoid(path, name='Cora')[0]

loader = NeighborLoader(
    data,
    # Sample 30 neighbors for each node for 2 iterations
    num_neighbors=[30] * 2,
    # Use a batch size of 128 for sampling training nodes
    batch_size=128,
    input_nodes=data.train_mask,
)

sampled_data = next(iter(loader))
print(sampled_data.batch_size)
>>> 128

默认情况下，数据加载器只会包含最初采样的边（directed = True）。此选项仅在跳数等于GNN层数时使用。如果GNN层数大于跳数，考虑设置directed = False，这将包含所有采样节点之间的所有边（但因此会稍微慢一些）。

此外，NeighborLoader 适用于通过 Data 存储的同质图，以及通过 HeteroData 存储的异质图。在异质图中操作时，每个 edge_type 将采样最多 num_neighbors 个邻居。然而，可以对单个边类型的采样邻居数量进行更细粒度的控制：

from torch_geometric.datasets import OGB_MAG
from torch_geometric.loader import NeighborLoader

hetero_data = OGB_MAG(path)[0]

loader = NeighborLoader(
    hetero_data,
    # Sample 30 neighbors for each node and edge type for 2 iterations
    num_neighbors={key: [30] * 2 for key in hetero_data.edge_types},
    # Use a batch size of 128 for sampling training nodes of type paper
    batch_size=128,
    input_nodes=('paper', hetero_data['paper'].train_mask),
)

sampled_hetero_data = next(iter(loader))
print(sampled_hetero_data['paper'].batch_size)
>>> 128

注意

有关使用 NeighborLoader的示例，请参见 examples/hetero/to_hetero_mag.py。

NeighborLoader 将返回子图，其中全局节点索引被映射到与该特定子图对应的局部索引。然而，通常希望将当前子图的节点映射回全局节点索引。NeighborLoader 将包含此映射作为 data 对象的一部分：

loader = NeighborLoader(data, ...)
sampled_data = next(iter(loader))
print(sampled_data.n_id)  # Global node index of each node in batch.

特别是，数据加载器将为返回的小批量添加以下属性：

batch_size 种子节点的数量（批次中的第一个节点）
n_id 每个采样节点的全局节点索引
e_id 每个采样边的全局边索引
input_id: input_nodes的全局索引
num_sampled_nodes: 每一跳中采样的节点数量
num_sampled_edges: 每一跳中采样的边的数量

Parameters:

data (Any) – A Data, HeteroData, or (FeatureStore, GraphStore) data object.
num_neighbors (List[int] or Dict[Tuple[str, str, str], List[int]]) – 每次迭代中为每个节点采样的邻居数量。如果某个条目设置为 -1，则将包含所有邻居。在异质图中，也可以接受一个字典，表示要为每种边类型采样的邻居数量。
input_nodes (torch.Tensor 或 str 或 Tuple[str, torch.Tensor]) – 用于采样邻居以创建小批量的节点索引。需要以 torch.LongTensor 或 torch.BoolTensor 的形式给出。如果设置为 None，则将考虑所有节点。在异构图（heterogeneous graphs）中，需要以包含节点类型和节点索引的元组形式传递。（默认值：None）
input_time (torch.Tensor, optional) – Optional values to override the timestamp for the input nodes given in input_nodes. If not set, will use the timestamps in time_attr as default (if present). The time_attr needs to be set for this to work. (default: None)
replace (bool, 可选) – 如果设置为 True，将进行有放回的抽样。（默认值：False）
subgraph_type (SubgraphType 或 str, 可选) – 返回的子图类型。如果设置为 "directional"，返回的子图仅包含计算采样种子节点表示所需的采样（有向）边。如果设置为 "bidirectional"，采样的边将转换为双向边。如果设置为 "induced"，返回的子图包含所有采样节点的诱导子图。 (默认: "directional")
disjoint (bool, optional) – 如果设置为 :obj: True，每个种子节点将创建自己的独立子图。如果设置为 True，小批量输出将包含一个 batch 向量，用于保存节点到其各自子图的映射。在时间采样的情况下，将自动设置为 True。（默认值：False）
temporal_strategy (str, optional) – 使用时间采样时的采样策略 ("uniform", "last"). 如果设置为 "uniform", 将在满足时间约束的邻居中均匀采样。如果设置为 "last", 将采样满足时间约束的最后 num_neighbors 个邻居。 (默认: "uniform")
time_attr (str, 可选) – 表示图中节点或边时间戳的属性的名称。如果设置，将使用时间采样，以确保邻居满足时间约束，即邻居的时间戳早于或等于中心节点的时间戳。 (默认: None)
weight_attr (str, optional) – 表示图中边权重的属性名称。如果设置，将使用加权/有偏采样，使得邻居被采样的可能性与其边权重成正比。边权重不需要总和为一，但必须是非负的、有限的，并且在局部邻域内具有非零和。 (默认值: None)
transform (callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default: None)
transform_sampler_output (callable, optional) – A function/transform that takes in a torch_geometric.sampler.SamplerOutput and returns a transformed version. (default: None)
is_sorted (bool, 可选) – 如果设置为 True，则假设 edge_index 是按列排序的。如果设置了 time_attr，则还要求行在单个邻域内按时间排序。这避免了数据的内部重新排序，并可以提高运行时和内存效率。（默认值：False）
filter_per_worker (bool, optional) – If set to True, will filter the returned data in each worker’s subprocess. If set to False, will filter the returned data in the main process. If set to None, will automatically infer the decision based on whether data partially lives on the GPU (filter_per_worker=True) or entirely on the CPU (filter_per_worker=False). There exists different trade-offs for setting this option. Specifically, setting this option to True for in-memory datasets will move all features to shared memory, which may result in too many open file handles. (default: None)
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader, such as batch_size, shuffle, drop_last or num_workers.

class LinkNeighborLoader(data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]], num_neighbors: Union[List[int], Dict[Tuple[str, str, str], List[int]]], edge_label_index: Union[Tensor, None, Tuple[str, str, str], Tuple[Tuple[str, str, str], Optional[Tensor]]] = None, edge_label: Optional[Tensor] = None, edge_label_time: Optional[Tensor] = None, replace: bool = False, subgraph_type: Union[SubgraphType, str] = 'directional', disjoint: bool = False, temporal_strategy: str = 'uniform', neg_sampling: Optional[NegativeSampling] = None, neg_sampling_ratio: Optional[Union[int, float]] = None, time_attr: Optional[str] = None, weight_attr: Optional[str] = None, transform: Optional[Callable] = None, transform_sampler_output: Optional[Callable] = None, is_sorted: bool = False, filter_per_worker: Optional[bool] = None, neighbor_sampler: Optional[NeighborSampler] = None, directed: bool = True, **kwargs)[source]

一个基于链接的数据加载器，作为基于节点的torch_geometric.loader.NeighborLoader的扩展。这个加载器允许在大规模图上进行小批量训练的GNNs，其中全批量训练是不可行的。

更具体地说，这个加载器首先从输入边的集合 edge_label_index（这些边可能是也可能不是原始图中的边）中选择一个边的样本，然后通过每次迭代中采样 num_neighbors 个邻居，从这个列表中的所有节点构建一个子图。

from torch_geometric.datasets import Planetoid
from torch_geometric.loader import LinkNeighborLoader

data = Planetoid(path, name='Cora')[0]

loader = LinkNeighborLoader(
    data,
    # Sample 30 neighbors for each node for 2 iterations
    num_neighbors=[30] * 2,
    # Use a batch size of 128 for sampling training nodes
    batch_size=128,
    edge_label_index=data.edge_index,
)

sampled_data = next(iter(loader))
print(sampled_data)
>>> Data(x=[1368, 1433], edge_index=[2, 3103], y=[1368],
         train_mask=[1368], val_mask=[1368], test_mask=[1368],
         edge_label_index=[2, 128])

此外，还可以为采样的边提供边标签，这些标签随后会被添加到批次中：

loader = LinkNeighborLoader(
    data,
    num_neighbors=[30] * 2,
    batch_size=128,
    edge_label_index=data.edge_index,
    edge_label=torch.ones(data.edge_index.size(1))
)

sampled_data = next(iter(loader))
print(sampled_data)
>>> Data(x=[1368, 1433], edge_index=[2, 3103], y=[1368],
         train_mask=[1368], val_mask=[1368], test_mask=[1368],
         edge_label_index=[2, 128], edge_label=[128])

其余的功能与 NeighborLoader 相同，包括对异构图的支持。特别是，数据加载器将为返回的小批量添加以下属性：

n_id The global node index for every sampled node
e_id The global edge index for every sampled edge
input_id: edge_label_index 的全局索引
num_sampled_nodes: The number of sampled nodes in each hop
num_sampled_edges: The number of sampled edges in each hop

注意

Negative sampling is currently implemented in an approximate way, i.e. negative edges may contain false negatives.

警告

请注意，采样方案与我们进行预测的边缘是独立的。也就是说，默认情况下，edge_label_index中的监督边缘不会在采样过程中被屏蔽。如果在data.edge_index中的消息传递边缘和edge_label_index中的监督边缘之间存在重叠，你可能会采样到你正在预测的边缘。通常可以通过使data.edge_index和edge_label_index成为两个不相交的边缘集合来避免这种行为（如果需要），例如，通过RandomLinkSplit转换及其disjoint_train_ratio参数。

Parameters:

data (Any) – A Data, HeteroData, or (FeatureStore, GraphStore) data object.
num_neighbors (List[int] or Dict[Tuple[str, str, str], List[int]]) – The number of neighbors to sample for each node in each iteration. If an entry is set to -1, all neighbors will be included. In heterogeneous graphs, may also take in a dictionary denoting the amount of neighbors to sample for each individual edge type.
edge_label_index (Tensor 或 EdgeType 或 Tuple[EdgeType, Tensor]) – 用于采样邻居以创建小批量的边的索引。如果设置为 None，则所有边都将被考虑。在异质图中，需要作为包含边类型和相应边索引的元组传递。 (默认值: None)
edge_label (Tensor, optional) – 用于采样邻居的边索引的标签。长度必须与edge_label_index相同。如果设置为None，则内部设置为torch.zeros(…)。（默认值：None）
edge_label_time (Tensor, optional) – 用于采样邻居的边索引的时间戳。必须与 edge_label_index 的长度相同。如果设置，将使用时间采样，以确保邻居满足时间约束，即，邻居的时间戳早于输出边的时间戳。要使此功能生效，需要设置 time_attr。（默认值：None）
replace (bool, optional) – If set to True, will sample with replacement. (default: False)
subgraph_type (SubgraphType or str, optional) – The type of the returned subgraph. If set to "directional", the returned subgraph only holds the sampled (directed) edges which are necessary to compute representations for the sampled seed nodes. If set to "bidirectional", sampled edges are converted to bidirectional edges. If set to "induced", the returned subgraph contains the induced subgraph of all sampled nodes. (default: "directional")
disjoint (bool, optional) – If set to :obj: True, each seed node will create its own disjoint subgraph. If set to True, mini-batch outputs will have a batch vector holding the mapping of nodes to their respective subgraph. Will get automatically set to True in case of temporal sampling. (default: False)
temporal_strategy (str, optional) – The sampling strategy when using temporal sampling ("uniform", "last"). If set to "uniform", will sample uniformly across neighbors that fulfill temporal constraints. If set to "last", will sample the last num_neighbors that fulfill temporal constraints. (default: "uniform")
neg_sampling (NegativeSampling, optional) – The negative sampling configuration. For negative sampling mode "binary", samples can be accessed via the attributes edge_label_index and edge_label in the respective edge type of the returned mini-batch. In case edge_label does not exist, it will be automatically created and represents a binary classification task (0 = negative edge, 1 = positive edge). In case edge_label does exist, it has to be a categorical label from 0 to num_classes - 1. After negative sampling, label 0 represents negative edges, and labels 1 to num_classes represent the labels of positive edges. Note that returned labels are of type torch.float for binary classification (to facilitate the ease-of-use of F.binary_cross_entropy()) and of type torch.long for multi-class classification (to facilitate the ease-of-use of F.cross_entropy()). For negative sampling mode "triplet", samples can be accessed via the attributes src_index, dst_pos_index and dst_neg_index in the respective node types of the returned mini-batch. edge_label needs to be None for "triplet" negative sampling mode. If set to None, no negative sampling strategy is applied. (default: None)
neg_sampling_ratio (int 或 float, 可选) – 采样的负边与正边数量的比例。已弃用，推荐使用 neg_sampling 参数。 (默认: None)
time_attr (str, optional) – 表示图中节点或边的时间戳的属性名称。如果设置，将使用时间采样，以确保邻居满足时间约束，即邻居的时间戳早于或等于中心节点的时间戳。仅在设置了edge_label_time时使用。（默认值：None）
weight_attr (str, optional) – The name of the attribute that denotes edge weights in the graph. If set, weighted/biased sampling will be used such that neighbors are more likely to get sampled the higher their edge weights are. Edge weights do not need to sum to one, but must be non-negative, finite and have a non-zero sum within local neighborhoods. (default: None)
transform (callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default: None)
transform_sampler_output (callable, optional) – A function/transform that takes in a torch_geometric.sampler.SamplerOutput and returns a transformed version. (default: None)
is_sorted (bool, optional) – If set to True, assumes that edge_index is sorted by column. If time_attr is set, additionally requires that rows are sorted according to time within individual neighborhoods. This avoids internal re-sorting of the data and can improve runtime and memory efficiency. (default: False)
filter_per_worker (bool, optional) – If set to True, will filter the returned data in each worker’s subprocess. If set to False, will filter the returned data in the main process. If set to None, will automatically infer the decision based on whether data partially lives on the GPU (filter_per_worker=True) or entirely on the CPU (filter_per_worker=False). There exists different trade-offs for setting this option. Specifically, setting this option to True for in-memory datasets will move all features to shared memory, which may result in too many open file handles. (default: None)
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader, such as batch_size, shuffle, drop_last or num_workers.

class HGTLoader(data: Union[HeteroData, Tuple[FeatureStore, GraphStore]], num_samples: Union[List[int], Dict[str, List[int]]], input_nodes: Union[str, Tuple[str, Optional[Tensor]]], is_sorted: bool = False, transform: Optional[Callable] = None, transform_sampler_output: Optional[Callable] = None, filter_per_worker: Optional[bool] = None, **kwargs)[source]

来自“异构图变换器”论文的异构图采样器。此加载器允许在大规模图上进行GNN的小批量训练，其中全批量训练不可行。

HGTLoader 尝试 (1) 保持每种类型的节点和边的数量相似，以及 (2) 保持采样的子图密集，以最小化信息损失并减少样本方差。

有条不紊地，HGTLoader 跟踪每个节点类型的节点预算，然后用于确定节点的采样概率。特别是，采样节点的概率由与已采样节点的连接数量及其节点度数决定。因此，HGTLoader 将在每次迭代中为每个节点类型采样固定数量的邻居，如 num_samples 参数所给出的。

Sampled nodes are sorted based on the order in which they were sampled. In particular, the first batch_size nodes represent the set of original mini-batch nodes.

注意

有关使用 HGTLoader 的示例，请参见 examples/hetero/to_hetero_mag.py。

from torch_geometric.loader import HGTLoader
from torch_geometric.datasets import OGB_MAG

hetero_data = OGB_MAG(path)[0]

loader = HGTLoader(
    hetero_data,
    # Sample 512 nodes per type and per iteration for 4 iterations
    num_samples={key: [512] * 4 for key in hetero_data.node_types},
    # Use a batch size of 128 for sampling training nodes of type paper
    batch_size=128,
    input_nodes=('paper', hetero_data['paper'].train_mask),
)

sampled_hetero_data = next(iter(loader))
print(sampled_data.batch_size)
>>> 128

Parameters:

data (Any) – A Data, HeteroData, or (FeatureStore, GraphStore) data object.
num_samples (List[int] or Dict[str, List[int]]) – 每次迭代和每种节点类型要采样的节点数量。如果以列表形式给出，将为每种节点类型采样相同数量的节点。
input_nodes (str 或 Tuple[str, torch.Tensor]) – 用于采样邻居以创建小批量的节点索引。需要作为包含节点类型和相应节点索引的元组传递。节点索引需要作为 torch.LongTensor 或 torch.BoolTensor 给出。如果节点索引设置为 None，则将考虑该特定类型的所有节点。
transform (callable, optional) – 一个函数/转换，它接收一个采样的迷你批次并返回一个转换后的版本。 (默认: None)
transform_sampler_output (callable, optional) – A function/transform that takes in a torch_geometric.sampler.SamplerOutput and returns a transformed version. (default: None)
is_sorted (bool, 可选) – 如果设置为 True，则假定 edge_index 是按列排序的。这可以避免数据的内部重新排序，并可以提高运行时和内存效率。（默认值：False）
filter_per_worker (bool, optional) – If set to True, will filter the returned data in each worker’s subprocess. If set to False, will filter the returned data in the main process. If set to None, will automatically infer the decision based on whether data partially lives on the GPU (filter_per_worker=True) or entirely on the CPU (filter_per_worker=False). There exists different trade-offs for setting this option. Specifically, setting this option to True for in-memory datasets will move all features to shared memory, which may result in too many open file handles. (default: None)
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader, such as batch_size, shuffle, drop_last or num_workers.

class ClusterData(data, num_parts: int, recursive: bool = False, save_dir: Optional[str] = None, filename: Optional[str] = None, log: bool = True, keep_inter_cluster_edges: bool = False, sparse_format: Literal['csr', 'csc'] = 'csr')[source]

将图数据对象聚类/分区为多个子图，这是受到“Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks”论文的启发。

注意

底层的METIS算法需要无向图作为输入。

Parameters:

data (torch_geometric.data.Data) – 图数据对象。
num_parts (int) – The number of partitions.
recursive (bool, optional) – If set to True, will use multilevel recursive bisection instead of multilevel k-way partitioning. (default: False)
save_dir (str, optional) – 如果设置，将会把分区数据保存到 save_dir 目录中以供更快地重复使用。（默认值：None）
filename (str, optional) – 存储的分区文件的名称。 (默认: None)
log (bool, 可选) – 如果设置为 False，将不会记录任何进度。（默认值：True）
keep_inter_cluster_edges (bool, 可选) – 如果设置为 True，将保留集群间的边连接。（默认值：False）
sparse_format (str, optional) – 用于计算分区的稀疏格式。（默认："csr"）

class ClusterLoader(cluster_data, **kwargs)[source]

数据加载器方案来自“Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks”论文，该方案将大规模图数据对象中的分区子图及其集群间链接合并，形成一个迷你批次。

注意

使用 ClusterData 和 ClusterLoader 结合来形成小批量的集群。有关使用 Cluster-GCN 的示例，请参见 examples/cluster_gcn_reddit.py 或 examples/cluster_gcn_ppi.py。

Parameters:

cluster_data (torch_geometric.loader.ClusterData) – 已经分区好的数据对象。
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader, such as batch_size, shuffle, drop_last or num_workers.

class GraphSAINTSampler(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]

GraphSAINT采样器基类来自“GraphSAINT: 基于图采样的归纳学习方法”论文。给定一个data对象中的图，该类采样节点并构建可以以小批量方式处理的子图。每个小批量的归一化系数通过node_norm和edge_norm数据属性给出。

注意

请参阅 GraphSAINTNodeSampler, GraphSAINTEdgeSampler 和 GraphSAINTRandomWalkSampler 了解当前支持的采样器。有关使用 GraphSAINT 采样的示例，请参阅 examples/graph_saint.py。

Parameters:

data (torch_geometric.data.Data) – The graph data object.
batch_size (int) – 每批样本的大致数量。
num_steps (int, 可选) – 每个epoch的迭代次数。 (默认: 1)
sample_coverage (int) – 每个节点应使用多少样本来计算归一化统计量。（默认值：0）
save_dir (str, optional) – 如果设置，将会把归一化统计信息保存到 save_dir 目录中以供更快地重复使用。 (默认: None)
log (bool, 可选) – 如果设置为 False，将不会记录任何预处理进度。（默认值：True）
**kwargs (可选) – torch.utils.data.DataLoader 的额外参数，例如 batch_size 或 num_workers。

class GraphSAINTNodeSampler(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]: GraphSAINT节点采样器类（参见 GraphSAINTSampler）。

class GraphSAINTEdgeSampler(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]: GraphSAINT 边采样器类（参见 GraphSAINTSampler）。

class GraphSAINTRandomWalkSampler(data, batch_size: int, walk_length: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]

GraphSAINT随机游走采样器类（参见 GraphSAINTSampler）。

Parameters:: walk_length (int) – 每次随机游走的长度。

class ShaDowKHopSampler(data: Data, depth: int, num_neighbors: int, node_idx: Optional[Tensor] = None, replace: bool = False, **kwargs)[source]

来自“Decoupling the Depth and Scope of Graph Neural Networks”论文的ShaDow \(k\)-hop采样器。给定一个data对象中的图，采样器将创建浅层、局部化的子图。然后，在这个局部图上的深度GNN会平滑信息丰富的局部信号。

注意

有关使用 ShaDowKHopSampler 的示例，请参见 examples/shadow.py。

Parameters:

data (torch_geometric.data.Data) – The graph data object.
depth (int) – 局部子图的深度/跳数。
num_neighbors (int) – 每个节点在每个跳数中采样的邻居数量。
node_idx (LongTensor 或 BoolTensor, 可选) – 应该考虑用于创建小批量的节点。如果设置为 None，则将考虑所有节点。
replace (bool, optional) – 如果设置为 True，将会替换采样邻居。(默认: False)
**kwargs (可选) – torch.utils.data.DataLoader 的额外参数，例如 batch_size 或 num_workers。

class RandomNodeLoader(data: Union[Data, HeteroData], num_parts: int, **kwargs)[source]

一个数据加载器，它随机采样图中的节点并返回它们的诱导子图。

注意

有关使用 RandomNodeLoader的示例，请参见 examples/ogbn_proteins_deepgcn.py。

Parameters:

data (torch_geometric.data.Data or torch_geometric.data.HeteroData) – The Data or HeteroData graph object.
num_parts (int) – The number of partitions.
**kwargs (可选) – torch.utils.data.DataLoader 的额外参数，例如 num_workers。

class ZipLoader(loaders: Union[List[NodeLoader], List[LinkLoader]], filter_per_worker: Optional[bool] = None, **kwargs)[source]

一个加载器，通过从多个NodeLoader或LinkLoader实例中采样，返回数据对象的元组。

Parameters:

loaders (List[NodeLoader] 或 List[LinkLoader]) – 加载器实例。
filter_per_worker (bool, optional) – If set to True, will filter the returned data in each worker’s subprocess. If set to False, will filter the returned data in the main process. If set to None, will automatically infer the decision based on whether data partially lives on the GPU (filter_per_worker=True) or entirely on the CPU (filter_per_worker=False). There exists different trade-offs for setting this option. Specifically, setting this option to True for in-memory datasets will move all features to shared memory, which may result in too many open file handles. (default: None)
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader, such as batch_size, shuffle, drop_last or num_workers.

class DataListLoader(dataset: Union[数据集, List[BaseData]], batch_size: int = 1, shuffle: bool = False, **kwargs)[source]

一个数据加载器，它将来自torch_geometric.data.dataset的数据对象批量加载到一个Python列表中。数据对象可以是Data类型或 HeteroData类型。

注意

此数据加载器应通过torch_geometric.nn.DataParallel用于多GPU支持。

Parameters:

dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default: 1)
shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch. (default: False)
**kwargs (可选) – torch.utils.data.DataLoader 的额外参数，例如 drop_last 或 num_workers。

class DenseDataLoader(dataset: Union[数据集, List[Data]], batch_size: int = 1, shuffle: bool = False, **kwargs)[source]

一个数据加载器，它将来自torch_geometric.data.dataset的数据对象批量处理为torch_geometric.data.Batch对象，通过将所有属性堆叠到一个新的维度中。

注意

要使用此数据加载器，数据集中的所有图属性需要具有相同的形状。特别是，此数据加载器应仅在与密集邻接矩阵一起工作时使用。

Parameters:

dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default: 1)
shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch. (default: False)
**kwargs (可选) – torch.utils.data.DataLoader 的额外参数，例如 drop_last 或 num_workers。

class TemporalDataLoader(data: TemporalData, batch_size: int = 1, neg_sampling_ratio: float = 0.0, **kwargs)[source]

一个数据加载器，它将连续的torch_geometric.data.TemporalData事件合并为一个小批量。

Parameters:

data (TemporalData) – 从中加载数据的 TemporalData。
batch_size (int, optional) – How many samples per batch to load. (default: 1)
neg_sampling_ratio (float, optional) – 采样的负目标节点数与正目标节点数的比率。 (默认: 0.0)
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader.

class NeighborSampler(edge_index: Union[Tensor, SparseTensor], sizes: List[int], node_idx: Optional[Tensor] = None, num_nodes: Optional[int] = None, return_e_id: bool = True, transform: Optional[Callable] = None, **kwargs)[source]

来自“Inductive Representation Learning on Large Graphs”论文的邻居采样器，它允许在大规模图上进行GNN的小批量训练，而全批量训练是不可行的。

给定一个具有\(L\)层的GNN和一个特定的节点小批量node_idx，我们希望为其计算嵌入，该模块迭代地采样邻居并构建二分图，以模拟GNN的实际计算流程。

更具体地说，sizes 表示我们希望在每一层中为每个节点采样多少邻居。然后，该模块接收这些 sizes 并迭代地为每一层 l 中的每个节点采样 sizes[l]。在下一层中，对已经遇到的节点的并集重复采样。然后，实际的计算图以反向模式返回，这意味着我们将消息从较大的节点集传递到较小的节点集，直到我们到达最初想要计算嵌入的节点。

因此，由NeighborSampler返回的项目包含当前的 batch_size，参与计算的所有节点的ID n_id，以及通过元组 (edge_index, e_id, size)表示的双边图对象列表，其中edge_index表示源节点和目标节点之间的双边边，e_id表示完整图中原始边的ID，size表示双边图的形状。对于每个双边图，目标节点也包含在源节点列表的开头，以便可以轻松应用跳跃连接或添加自环。

警告

NeighborSampler 已被弃用，并将在未来的版本中移除。请使用 torch_geometric.loader.NeighborLoader 代替。

注意

有关使用 NeighborSampler 的示例，请参见 examples/reddit.py 或 examples/ogbn_train.py。

Parameters:

edge_index (Tensor 或 SparseTensor) – 一个 torch.LongTensor 或 torch_sparse.SparseTensor，用于定义底层图的连接性/消息传递流。 edge_index 保存了一个（稀疏）对称邻接矩阵的索引。如果 edge_index 是 torch.LongTensor 类型，其形状必须定义为 [2, num_edges]，其中消息从节点 edge_index[0] 发送到节点 edge_index[1] （在 flow="source_to_target" 的情况下）。如果 edge_index 是 torch_sparse.SparseTensor 类型，其稀疏索引 (row, col) 应关联到 row = edge_index[1] 和 col = edge_index[0]。两种格式之间的主要区别在于我们需要输入转置的稀疏邻接矩阵。
sizes ([int]) – 每层中每个节点要采样的邻居数量。如果设置为 sizes[l] = -1，则在第 l 层中包含所有邻居。
node_idx (LongTensor, optional) – 应该考虑用于创建小批量的节点。如果设置为None，将考虑所有节点。
num_nodes (int, optional) – 图中的节点数量。 (默认值: None)
return_e_id (bool, 可选) – 如果设置为 False，将不会返回采样边的原始边索引。这仅在操作没有边特征的图时有用，以节省内存。 (默认: True)
transform (callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default: None)
**kwargs (optional) – Additional arguments of torch.utils.data.DataLoader, such as batch_size, shuffle, drop_last or num_workers.

class ImbalancedSampler(dataset: Union[数据集, Data, List[Data], Tensor], input_nodes: Optional[Tensor] = None, num_samples: Optional[int] = None)[source]

一个加权随机采样器，根据类别分布随机采样元素。因此，它将从多数类别中移除样本（欠采样）或从少数类别中添加更多样本（过采样）。

图级采样：

from torch_geometric.loader import DataLoader, ImbalancedSampler

sampler = ImbalancedSampler(dataset)
loader = DataLoader(dataset, batch_size=64, sampler=sampler, ...)

节点级采样：

from torch_geometric.loader import NeighborLoader, ImbalancedSampler

sampler = ImbalancedSampler(data, input_nodes=data.train_mask)
loader = NeighborLoader(data, input_nodes=data.train_mask,
                        batch_size=64, num_neighbors=[-1, -1],
                        sampler=sampler, ...)

你也可以直接将类标签作为torch.Tensor传入：

from torch_geometric.loader import NeighborLoader, ImbalancedSampler

sampler = ImbalancedSampler(data.y)
loader = NeighborLoader(data, input_nodes=data.train_mask,
                        batch_size=64, num_neighbors=[-1, -1],
                        sampler=sampler, ...)

Parameters:

dataset (Dataset 或 Data 或 Tensor) – 从中采样数据的数据集或类分布，可以是一个 Dataset, Data, 或 torch.Tensor 对象。
input_nodes (Tensor, optional) – 由相应加载器使用的节点索引，例如，由 NeighborLoader。如果设置为 None，则将考虑所有节点。此参数应仅针对节点级加载器设置，并且在操作由 Dataset 给出的一组图时没有任何效果。（默认值：None）
num_samples (int, optional) – 单个epoch中要抽取的样本数量。如果设置为None，将抽取基础数据中存在的所有元素。（默认值：None）

class DynamicBatchSampler(dataset: 数据集, max_num: int, mode: str = 'node', shuffle: bool = False, skip_too_big: bool = False, num_steps: Optional[int] = None)[source]

动态地将样本添加到小批量中，直到达到最大大小（基于节点数或边数）。当数据样本的大小范围很广时，按样本数量指定小批量大小并不理想，可能会导致CUDA内存不足错误。

在DynamicBatchSampler中，每个epoch的步数是模糊的，取决于样本的顺序。默认情况下，__len__()将是未定义的。这在大多数情况下是没问题的，但进度条将会是无限的。或者，可以提供num_steps来限制采样器生成的小批量数量。

from torch_geometric.loader import DataLoader, DynamicBatchSampler

sampler = DynamicBatchSampler(dataset, max_num=10000, mode="node")
loader = DataLoader(dataset, batch_sampler=sampler, ...)

Parameters:

dataset (Dataset) – 从中采样的数据集。
max_num (int) – 目标小批量的大小，以节点或边的数量为单位。
mode (str, optional) – "node" 或 "edge" 用于测量批量大小。（默认值："node"）
shuffle (bool, optional) – 如果设置为 True，将在每个周期重新洗牌数据。（默认值：False）
skip_too_big (bool, optional) – 如果设置为 True，跳过无法单独放入一个批次的样本。（默认值：False）
num_steps (int, optional) – 单个epoch中要绘制的小批量数量。如果设置为None，将遍历所有底层示例，但由于不明确，__len__()将为None。（默认值：None）

class PrefetchLoader(loader: DataLoader, device: Optional[device] = None)[source]

一个用于异步将torch.utils.data.DataLoader的数据从主机内存传输到设备内存的GPU预取器类。

Parameters:

loader (torch.utils.data.DataLoader) – 数据加载器。
device (torch.device, optional) – 加载数据到的设备。 (default: None)

class CachedLoader(loader: DataLoader, device: Optional[device] = None, transform: Optional[Callable] = None)[source]

一个用于缓存小批量输出的加载器，例如，在NeighborLoader迭代期间获得的输出。

Parameters:

loader (torch.utils.data.DataLoader) – 数据加载器。
device (torch.device, optional) – 加载数据到的设备。 (default: None)
transform (callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default: None)

clear()[source]: 清除缓存。

class AffinityMixin[source]

一个上下文管理器，用于为数据加载器工作进程启用CPU亲和性（仅在CPU设备上运行时使用）。

亲和性将数据加载器的工作线程放置在特定的CPU核心上。实际上，它允许更有效的本地内存分配并减少远程内存调用。每次进程或线程从一个核心移动到另一个核心时，寄存器和缓存需要被刷新并重新加载。如果这种情况经常发生，成本会非常高，而且我们的线程可能也不再靠近它们的数据，或者无法在缓存中共享数据。

请参阅这里获取相关教程。

警告

为了正确关联计算线程（即使用 KMP_AFFINITY），请确保从主进程可用的核心列表中排除 loader_cores。否则会导致核心过度订阅并加剧性能问题。

loader = NeigborLoader(data, num_workers=3)
with loader.enable_cpu_affinity(loader_cores=[0, 1, 2]):
    for batch in loader:
        pass

enable_cpu_affinity(loader_cores: Optional[Union[List[List[int]], List[int]]] = None) → None[source]

启用CPU亲和性。

Parameters:: loader_cores ([int], 可选) – 数据加载器工作线程应绑定到的CPU核心列表。默认情况下，它将绑定到numa0核心。如果与"spawn"多进程上下文一起使用，它将自动启用多线程并为每个工作线程使用多个核心。
Return type:: None

class RAGQueryLoader(data: Tuple[RAGFeatureStore, RAGGraphStore], local_filter: Optional[Callable[[Data, Any], Data]] = None, seed_nodes_kwargs: Optional[Dict[str, Any]] = None, seed_edges_kwargs: Optional[Dict[str, Any]] = None, sampler_kwargs: Optional[Dict[str, Any]] = None, loader_kwargs: Optional[Dict[str, Any]] = None)[source]

用于从远程后端进行RAG查询的加载器。

query(query: Any) → Data[source]

检索与查询关联的子图及其所有特征属性。

Return type:: Data

class RAGFeatureStore(*args, **kwargs)[source]

远程GNN RAG后端的特征存储模板。

abstract retrieve_seed_nodes(query: Any, **kwargs) → Union[Tensor, None, str, Tuple[str, Optional[Tensor]]][source]

在查询和所有节点之间进行比较，以获取所有最接近的节点。返回将作为RAG采样器种子的节点的索引。

Return type:: Union[Tensor, None, str, Tuple[str, Optional[Tensor]]]

abstract retrieve_seed_edges(query: Any, **kwargs) → Union[Tensor, None, Tuple[str, str, str], Tuple[Tuple[str, str, str], Optional[Tensor]]][source]

在查询和所有边之间进行比较，以获取所有最接近的节点。返回将作为RAG采样器种子的边的索引。

Return type:: Union[Tensor, None, Tuple[str, str, str], Tuple[Tuple[str, str, str], Optional[Tensor]]]

abstract load_subgraph(sample: Union[SamplerOutput, HeteroSamplerOutput]) → Union[Data, HeteroData][source]

将采样子图输出与Data对象中的特征结合。

Return type:: Union[Data, HeteroData]

class RAGGraphStore(*args, **kwargs)[source]

用于远程GNN RAG后端的图存储模板。

abstract sample_subgraph(seed_nodes: Union[Tensor, None, str, Tuple[str, Optional[Tensor]]], seed_edges: Union[Tensor, None, Tuple[str, str, str], Tuple[Tuple[str, str, str], Optional[Tensor]]], **kwargs) → Union[SamplerOutput, HeteroSamplerOutput][source]

使用种子节点和边对子图进行采样。

Return type:: Union[SamplerOutput, HeteroSamplerOutput]

abstract register_feature_store(feature_store: FeatureStore)[source]: 注册一个特征存储以与采样器一起使用。采样器需要从特征存储中获取信息，以便在异质图上正常工作。