LayerNeighborSampler

class dgl.graphbolt.LayerNeighborSampler(datapipe, graph, fanouts, replace=False, prob_name=None, deduplicate=True, layer_dependency=False, batch_dependency=1)[source]

Bases: NeighborSamplerImpl

从图中采样邻居边并返回子图。

功能名称: sample_layer_neighbor.

Sampler that builds computational dependency of node representations via labor sampling for multilayer GNN from the NeurIPS 2023 paper Layer-Neighbor Sampling – Defusing Neighborhood Explosion in GNNs

Layer-Neighbor采样器负责从给定数据中采样一个子图。它返回一个诱导子图以及压缩信息。在节点分类任务的上下文中,邻居采样器直接使用提供的节点作为种子节点。然而,在涉及链接预测的场景中,该过程需要另一个预处理操作。即,从给定的节点对中收集唯一节点,包括正节点对和负节点对,并将这些节点用作后续步骤的种子节点。

实现了论文附录A.3中描述的方法。类似于 dgl.dataloading.LaborSampler,但这里使用顺序泊松采样 而不是泊松采样,以保持每个顶点的采样边数 像NeighborSampler一样确定。因此,它是NeighborSampler的 直接替代品。然而,与NeighborSampler不同,它在多层GNN场景中 采样更少的顶点和边,而不会影响训练迭代的收敛速度。

Parameters:
  • datapipe (DataPipe) – The datapipe.

  • graph (FusedCSCSamplingGraph) – The graph on which to perform subgraph sampling.

  • fanouts (list[torch.Tensor]) – 每个节点要采样的边数,考虑或不考虑边类型。此参数的长度隐式表示正在进行的采样层。

  • replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.

  • prob_name (str, optional) – The name of an edge attribute used as the weights of sampling for each node. This attribute tensor should contain (unnormalized) probabilities corresponding to each neighboring edge of a node. It must be a 1D floating-point or boolean tensor, with the number of elements equalling the total number of edges.

  • deduplicate (bool) – Boolean indicating whether seeds between hops will be deduplicated. If True, the same elements in seeds will be deleted to only one. Otherwise, the same elements will be remained.

  • layer_dependency (bool) – 布尔值,指示不同层是否应使用相同的随机变量。这会导致采样的节点数量减少,并将LayerNeighborSampler转变为子图采样方法。后续层将保证采样到与之前层重叠的邻居。

  • batch_dependency (int) – 指定连续的小批量是否应使用相似的随机变量。这会导致采样的节点和边具有更高的时间访问局部性。将其设置为 \(\kappa\) 会使随机变量的变化速度减慢,与 \(\frac{1}{\kappa}\) 成比例。实现了 arXiv:2310.12403 中的依赖小批量方法。

示例

>>> import dgl.graphbolt as gb
>>> import torch
>>> indptr = torch.LongTensor([0, 2, 4, 5, 6, 7 ,8])
>>> indices = torch.LongTensor([1, 2, 0, 3, 5, 4, 3, 5])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices)
>>> seeds = torch.LongTensor([[0, 1], [1, 2]])
>>> item_set = gb.ItemSet(seeds, names="seeds")
>>> item_sampler = gb.ItemSampler(item_set, batch_size=1,)
>>> neg_sampler = gb.UniformNegativeSampler(item_sampler, graph, 2)
>>> fanouts = [torch.LongTensor([5]),
...     torch.LongTensor([10]),torch.LongTensor([15])]
>>> subgraph_sampler = gb.LayerNeighborSampler(neg_sampler, graph, fanouts)
>>> next(iter(subgraph_sampler)).sampled_subgraphs
[SampledSubgraphImpl(sampled_csc=CSCFormatBase(
        indptr=tensor([0, 2, 4, 5, 6, 7, 8]),
        indices=tensor([1, 3, 0, 4, 2, 2, 5, 4]),
    ),
    original_row_node_ids=tensor([0, 1, 5, 2, 3, 4]),
    original_edge_ids=None,
    original_column_node_ids=tensor([0, 1, 5, 2, 3, 4]),
),
SampledSubgraphImpl(sampled_csc=CSCFormatBase(
        indptr=tensor([0, 2, 4, 5, 6, 7]),
        indices=tensor([1, 3, 0, 4, 2, 2, 5]),
    ),
    original_row_node_ids=tensor([0, 1, 5, 2, 3, 4]),
    original_edge_ids=None,
    original_column_node_ids=tensor([0, 1, 5, 2, 3]),
),
SampledSubgraphImpl(sampled_csc=CSCFormatBase(
        indptr=tensor([0, 2, 4, 5, 6]),
        indices=tensor([1, 3, 0, 4, 2, 2]),
    ),
    original_row_node_ids=tensor([0, 1, 5, 2, 3]),
    original_edge_ids=None,
    original_column_node_ids=tensor([0, 1, 5, 2]),
)]
>>> next(iter(subgraph_sampler)).compacted_seeds
tensor([[0, 1], [0, 2], [0, 3]])
>>> next(iter(subgraph_sampler)).labels
tensor([1., 0., 0.])
>>> next(iter(subgraph_sampler)).indexes
tensor([0, 0, 0])