dgl.sampling.sample_labors

dgl.sampling.sample_labors(g, nodes, fanout, edge_dir='in', prob=None, importance_sampling=0, random_seed=None, seed2_contribution=0, copy_ndata=True, copy_edata=True, exclude_edges=None, output_device=None)[source]

Sampler that builds computational dependency of node representations via labor sampling for multilayer GNN from the NeurIPS 2023 paper Layer-Neighbor Sampling – Defusing Neighborhood Explosion in GNNs

此采样器将使每个节点从每种边类型的固定数量的邻居中收集消息。默认参数下,邻居是均匀选取的。对于每个将被考虑采样的顶点 t,将有一个单一的随机变量 r_t。

For each node, a number of inbound (or outbound when edge_dir == 'out') edges will be randomly chosen. The graph returned will then contain all the nodes in the original graph, but only the sampled edges.

Node/edge features are not preserved. The original IDs of the sampled edges are stored as the dgl.EID feature in the returned graph.

Parameters:
  • g (DGLGraph) – 该图,允许具有多种节点或边类型。可以在CPU或GPU上运行。

  • nodes (tensor or dict) –

    Node IDs to sample neighbors from.

    This argument can take a single ID tensor or a dictionary of node types and ID tensors. If a single tensor is given, the graph must only have one type of nodes.

  • fanout (intdict[etype, int]) –

    每个节点在每个边类型上要采样的边数。

    此参数可以接受一个整数或一个边类型和整数的字典。 如果给定一个整数,DGL将为每个节点在每个边类型上采样这个数量的边。

    如果为单个边类型给定-1,将选择该边类型的所有相邻边。

  • edge_dir (str, optional) –

    Determines whether to sample inbound or outbound edges.

    Can take either in for inbound edges or out for outbound edges.

  • prob (str, optional) –

    用作节点每个相邻边的(未归一化)概率的特征名称。该特征必须为每条边只有一个元素。

    特征必须是非负浮点数,并且每个节点的入边/出边的特征之和必须为正(尽管它们不必总和为一)。否则,结果将是未定义的。

    如果 prob 不是 None,则不支持 GPU 采样。

  • importance_sampling (int, optional) – 是否使用重要性采样或均匀采样,使用负值会优化重要性采样概率直到收敛,而使用正值则会运行优化步骤那么多次。如果值为i,则使用LABOR-i变体。

  • random_seed (tensor) –

    一个包含一个元素的int64张量。

    传递的random_seed使得对于任何种子顶点s及其邻居t, 在每次调用此函数时,使用相同的随机种子生成的随机变量r_t是相同的。当作为同一批次的一部分进行采样时,人们会希望使用相同的种子,以便LABOR可以全局采样。一个例子是,对于异构图,每种边类型都会传递一个随机种子。与为每种边类型使用唯一的随机种子相比,这将采样更少的顶点。如果对于具有不同随机种子的异构图,每种边类型分别调用此函数,则LABOR将在每种边类型上本地运行,导致采样的顶点数量更多。

    如果在没有random_seed的情况下调用此函数,我们将通过从DGL获取一个随机数来获得随机种子。如果多次调用此函数以作为单个批次的一部分进行采样,请使用相同的random_seed参数。

  • seed2_contribution (float, optional) – 一个介于 [0, 1) 之间的浮点值,用于确定第二个随机种子在生成 LABOR 采样算法的随机变量时的贡献。

  • copy_ndata (bool, optional) –

    If True, the node features of the new graph are copied from the original graph. If False, the new graph will not have any node features.

    (Default: True)

  • copy_edata (bool, optional) –

    If True, the edge features of the new graph are copied from the original graph. If False, the new graph will not have any edge features.

    (Default: True)

  • exclude_edges (tensor or dict) –

    Edge IDs to exclude during sampling neighbors for the seed nodes.

    This argument can take a single ID tensor or a dictionary of edge types and ID tensors. If a single tensor is given, the graph must only have one type of nodes.

  • output_device (Framework-specific device context object, optional) – The output device. Default is the same as the input graph.

Returns:

一个采样的子图,仅包含采样的相邻边以及边的权重。

Return type:

tuple(DGLGraph, list[Tensor])

注释

If copy_ndata or copy_edata is True, same tensors are used as the node or edge features of the original graph and the new graph. As a result, users should avoid performing in-place operations on the node features of the new graph to avoid feature corruption.

示例

假设你有以下图表

>>> g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]))

以及权重

>>> g.edata['prob'] = torch.FloatTensor([0., 1., 0., 1., 0., 1.])

为节点0和节点1采样一个入边:

>>> sg = dgl.sampling.sample_labors(g, [0, 1], 1)
>>> sg.edges(order='eid')
(tensor([1, 0]), tensor([0, 1]))
>>> sg.edata[dgl.EID]
tensor([2, 0])

To sample one inbound edge for node 0 and node 1 with probability in edge feature prob:

>>> sg = dgl.sampling.sample_labors(g, [0, 1], 1, prob='prob')
>>> sg.edges(order='eid')
(tensor([2, 1]), tensor([0, 1]))

With fanout greater than the number of actual neighbors and without replacement, DGL will take all neighbors instead:

>>> sg = dgl.sampling.sample_labors(g, [0, 1], 3)
>>> sg.edges(order='eid')
(tensor([1, 2, 0, 1]), tensor([0, 0, 1, 1]))

在种子节点的采样过程中排除某些EID:

>>> g = dgl.graph(([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]))
>>> g_edges = g.all_edges(form='all')``
(tensor([0, 0, 1, 1, 2, 2]), tensor([1, 2, 0, 1, 2, 0]), tensor([0, 1, 2, 3, 4, 5]))
>>> sg = dgl.sampling.sample_labors(g, [0, 1], 3, exclude_edges=[0, 1, 2])
>>> sg.all_edges(form='all')
(tensor([2, 1]), tensor([0, 1]), tensor([0, 1]))
>>> sg.has_edges_between(g_edges[0][:3],g_edges[1][:3])
tensor([False, False, False])
>>> g = dgl.heterograph({
...   ('drug', 'interacts', 'drug'): ([0, 0, 1, 1, 3, 2], [1, 2, 0, 1, 2, 0]),
...   ('drug', 'interacts', 'gene'): ([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0]),
...   ('drug', 'treats', 'disease'): ([0, 0, 1, 1, 2, 2], [1, 2, 0, 1, 2, 0])})
>>> g_edges = g.all_edges(form='all', etype=('drug', 'interacts', 'drug'))
(tensor([0, 0, 1, 1, 3, 2]), tensor([1, 2, 0, 1, 2, 0]), tensor([0, 1, 2, 3, 4, 5]))
>>> excluded_edges  = {('drug', 'interacts', 'drug'): g_edges[2][:3]}
>>> sg = dgl.sampling.sample_labors(g, {'drug':[0, 1]}, 3, exclude_edges=excluded_edges)
>>> sg.all_edges(form='all', etype=('drug', 'interacts', 'drug'))
(tensor([2, 1]), tensor([0, 1]), tensor([0, 1]))
>>> sg.has_edges_between(g_edges[0][:3],g_edges[1][:3],etype=('drug', 'interacts', 'drug'))
tensor([False, False, False])