FusedCSCSamplingGraph

class dgl.graphbolt.FusedCSCSamplingGraph(c_csc_graph: ScriptObject)[source]

基础类：SamplingGraph

CSC格式的采样图。

copy_to_shared_memory(shared_memory_name: str)[source]

将图表复制到共享内存中。

Parameters:: shared_memory_name (str) – Name of the shared memory.
Returns:: 共享内存上复制的FusedCSCSamplingGraph对象。
Return type:: FusedCSCSamplingGraph

in_subgraph(nodes: Tensor | Dict[str, Tensor]) → SampledSubgraphImpl[source]

返回由给定节点的入边诱导的子图。

子图中的 in 等同于使用给定节点的传入边创建一个新图。子图根据传入的 nodes 的顺序进行压缩。

Parameters:

nodes (torch.Tensor 或 Dict[str, torch.Tensor]) –

给定种子节点的ID。

如果 nodes 是一个张量：它表示图是同质图，里面的ID是同质ID。
如果 nodes 是一个字典：键应该是节点类型，里面的ID是异质ID。

Returns:

子图中的in。

Return type:

SampledSubgraphImpl

示例

>>> import dgl.graphbolt as gb
>>> import torch
>>> total_num_nodes = 5
>>> total_num_edges = 12
>>> ntypes = {"N0": 0, "N1": 1}
>>> etypes = {
...     "N0:R0:N0": 0, "N0:R1:N1": 1, "N1:R2:N0": 2, "N1:R3:N1": 3}
>>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12])
>>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor(
...     [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> nodes = {"N0":torch.LongTensor([1]), "N1":torch.LongTensor([1, 2])}
>>> in_subgraph = graph.in_subgraph(nodes)
>>> print(in_subgraph.sampled_csc)
{'N0:R0:N0': CSCFormatBase(indptr=tensor([0, 0]),
      indices=tensor([], dtype=torch.int64),
), 'N0:R1:N1': CSCFormatBase(indptr=tensor([0, 1, 2]),
            indices=tensor([1, 0]),
), 'N1:R2:N0': CSCFormatBase(indptr=tensor([0, 2]),
            indices=tensor([0, 1]),
), 'N1:R3:N1': CSCFormatBase(indptr=tensor([0, 1, 3]),
            indices=tensor([0, 1, 2]),
)}

pin_memory_()[source]: 将FusedCSCSamplingGraph复制到固定的内存中。返回原地修改的相同对象。

sample_layer_neighbors(seeds: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None, random_seed: Tensor | None = None, seed2_contribution: float = 0.0) → SampledSubgraphImpl[source]

对给定节点的邻近边进行采样，并通过从NeurIPS 2023论文中的层邻居采样返回诱导子图 Layer-Neighbor Sampling – Defusing Neighborhood Explosion in GNNs

Parameters:

seeds (torch.Tensor 或 Dict[str, torch.Tensor]) –
给定种子节点的ID。
- 如果 nodes 是一个张量：这意味着图是同质图，里面的ID是同质ID。
- 如果 nodes 是一个字典：键应该是节点类型，里面的ID是异质ID。
fanouts (torch.Tensor) –
每个节点要采样的边数，考虑或不考虑边类型。
- 当长度为1时，表示fanout适用于节点的所有邻居，不考虑边类型。
- 否则，长度应等于边类型的数量，每个fanout值对应于节点的特定边类型。
每个fanout的值应>= 0或= -1。
- 当值为-1时，所有邻居（如果有权重，则非零概率）将被采样一次，不考虑替换。这相当于当fanout >= 邻居数量时（并且replace设置为false），选择所有非零概率的邻居。
- 当值为非负整数时，它作为选择邻居的最小阈值。
replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.
probs_name (str, optional) – 一个可选的字符串，用于指定边属性的名称。此属性张量应包含与节点的每个相邻边对应的（未归一化的）概率。它必须是一个1D浮点或布尔张量，元素数量等于边的总数。
random_seed (torch.Tensor, optional) –
一个包含一个或两个元素的int64张量。

传入的random_seed使得对于任何种子节点s和其邻居t，在相同随机种子的情况下，每次调用此函数时，生成的随机变量r_t都是相同的。当作为同一批次的一部分进行采样时，需要相同的种子以便LABOR可以全局采样。一个例子是对于异构图，每种边类型都会传入一个随机种子。与为每种边类型使用唯一随机种子相比，这将采样更少的节点。如果对于具有不同随机种子的异构图，每种边类型分别调用此函数，则LABOR将在每种边类型上本地运行，导致采样的节点数量更多。

如果调用此函数时没有random_seed，则通过从GraphBolt获取一个随机数来获得随机种子。如果多次调用此函数作为单个批次的一部分进行采样，请使用相同的random_seed参数。

如果给出两个数字，则seed2_contribution参数决定两个随机种子之间的插值。
seed2_contribution (float, optional) – 一个介于 [0, 1) 之间的浮点值，用于确定第二个随机种子 random_seed[-1] 对生成随机变量的贡献。

Returns:

采样的子图。

Return type:

SampledSubgraphImpl

示例

>>> import dgl.graphbolt as gb
>>> import torch
>>> ntypes = {"n1": 0, "n2": 1}
>>> etypes = {"n1:e1:n2": 0, "n2:e2:n1": 1}
>>> indptr = torch.LongTensor([0, 2, 4, 6, 7, 9])
>>> indices = torch.LongTensor([2, 4, 2, 3, 0, 1, 1, 0, 1])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor([1, 1, 1, 1, 0, 0, 0, 0, 0])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> nodes = {'n1': torch.LongTensor([0]), 'n2': torch.LongTensor([0])}
>>> fanouts = torch.tensor([1, 1])
>>> subgraph = graph.sample_layer_neighbors(nodes, fanouts)
>>> print(subgraph.sampled_csc)
{'n1:e1:n2': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([0]),
), 'n2:e2:n1': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([2]),
)}

sample_negative_edges_uniform(edge_type, node_pairs, negative_ratio)[source]

通过均匀分布随机选择负源-目标边来采样负边。对于每条边 (u, v)，它应该生成 negative_ratio 对负边 (u, v')，其中 v' 是从图中的所有节点中均匀选择的。u 与相应的正边完全相同。它返回正边与负边连接的结果。在负边中，负源是从相应的正边构建的。

Parameters:

edge_type (str) – 提供的node_pairs中边的类型。任何采样的负边也将具有相同的类型。如果设置为None，它将被视为同构图。
node_pairs (torch.Tensor) – 一个2D张量，表示源-目标格式的N对正边，其中“正”表示这些边存在于图中。需要注意的是，在异构图的情况下，这些张量中的id表示异构id。
negative_ratio (int) – 负样本与正样本数量的比例。

Returns:

一个2D张量表示N对正负源-目标节点对。在异质图的上下文中，输入节点和选定的节点都由异质ID表示，形成的边是输入类型edge_type。请注意，负样本指的是假阴性，这意味着边可能在图中存在或不存在。

Return type:

torch.Tensor

sample_neighbors(seeds: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None) → SampledSubgraphImpl[source]

对给定节点的邻近边进行采样，并返回诱导子图。

Parameters:

seeds (torch.Tensor or Dict[str, torch.Tensor]) –
IDs of the given seed nodes.
- If nodes is a tensor: It means the graph is homogeneous graph, and ids inside are homogeneous ids.
- If nodes is a dictionary: The keys should be node type and ids inside are heterogeneous ids.
fanouts (torch.Tensor) –
The number of edges to be sampled for each node with or without considering edge types.
- When the length is 1, it indicates that the fanout applies to all neighbors of the node as a collective, regardless of the edge type.
- Otherwise, the length should equal to the number of edge types, and each fanout value corresponds to a specific edge type of the nodes.
The value of each fanout should be >= 0 or = -1.
- When the value is -1, all neighbors (with non-zero probability, if weighted) will be sampled once regardless of replacement. It is equivalent to selecting all neighbors with non-zero probability when the fanout is >= the number of neighbors (and replace is set to false).
- When the value is a non-negative integer, it serves as a minimum threshold for selecting neighbors.
replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.
probs_name (str, optional) – 一个可选的字符串，指定使用的边属性的名称。该属性张量应包含与节点的每个相邻边对应的（未归一化的）概率。它必须是一个1D浮点或布尔张量，元素数量等于边的总数。

Returns:

采样的子图。

Return type:

SampledSubgraphImpl

示例

>>> import dgl.graphbolt as gb
>>> import torch
>>> ntypes = {"n1": 0, "n2": 1}
>>> etypes = {"n1:e1:n2": 0, "n2:e2:n1": 1}
>>> indptr = torch.LongTensor([0, 2, 4, 6, 7, 9])
>>> indices = torch.LongTensor([2, 4, 2, 3, 0, 1, 1, 0, 1])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor([1, 1, 1, 1, 0, 0, 0, 0, 0])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> nodes = {'n1': torch.LongTensor([0]), 'n2': torch.LongTensor([0])}
>>> fanouts = torch.tensor([1, 1])
>>> subgraph = graph.sample_neighbors(nodes, fanouts)
>>> print(subgraph.sampled_csc)
{'n1:e1:n2': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([0]),
), 'n2:e2:n1': CSCFormatBase(indptr=tensor([0, 1]),
            indices=tensor([2]),
)}

temporal_sample_neighbors(nodes: Tensor | Dict[str, Tensor], input_nodes_timestamp: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None, node_timestamp_attr_name: str | None = None, edge_timestamp_attr_name: str | None = None) → ScriptObject[source]

对给定节点的相邻边进行时间采样，并返回诱导子图。

如果给出了node_timestamp_attr_name或edge_timestamp_attr_name，输入节点的采样邻居或边必须具有比输入节点更早的时间戳。

Parameters:

nodes (torch.Tensor) – 给定种子节点的ID。
input_nodes_timestamp (torch.Tensor) – 给定种子节点的时间戳。
fanouts (torch.Tensor) –
The number of edges to be sampled for each node with or without considering edge types.
- When the length is 1, it indicates that the fanout applies to all neighbors of the node as a collective, regardless of the edge type.
- Otherwise, the length should equal to the number of edge types, and each fanout value corresponds to a specific edge type of the nodes.
The value of each fanout should be >= 0 or = -1.
- When the value is -1, all neighbors (with non-zero probability, if weighted) will be sampled once regardless of replacement. It is equivalent to selecting all neighbors with non-zero probability when the fanout is >= the number of neighbors (and replace is set to false).
- When the value is a non-negative integer, it serves as a minimum threshold for selecting neighbors.
replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.
probs_name (str, optional) – An optional string specifying the name of an edge attribute. This attribute tensor should contain (unnormalized) probabilities corresponding to each neighboring edge of a node. It must be a 1D floating-point or boolean tensor, with the number of elements equalling the total number of edges.
node_timestamp_attr_name (str, optional) – 一个可选的字符串，指定节点属性的名称。
edge_timestamp_attr_name (str, optional) – 一个可选的字符串，用于指定边属性的名称。

Returns:

采样的子图。

Return type:

SampledSubgraphImpl

to(device: device) → None[source]: 将 FusedCSCSamplingGraph 复制到指定设备。

property csc_indptr: tensor

返回CSC图中的索引指针。

Returns:: CSC图中的索引指针。一个形状为(total_num_nodes+1,)的整数张量。
Return type:: torch.tensor

property edge_attributes: Dict[str, Tensor] | None

返回边缘属性字典。

Returns:: 如果存在，返回一个边属性的字典。每个键代表属性的名称，而对应的值包含属性的具体值。每个值的长度应与边的总数匹配。
Return type:: 字典[str, torch.Tensor] 或无

property edge_type_to_id: Dict[str, int] | None

如果存在，返回边类型到ID的字典。

Returns:: 如果存在，返回一个将边类型映射到边类型ID的字典。
Return type:: 字典[str, int] 或无

property indices: tensor

返回CSC图中的索引。

Returns:: CSC图中的索引。一个形状为(total_num_edges,)的整数张量。
Return type:: torch.tensor

注释

假设每个节点的边已经按照边类型ID排序。

property node_attributes: Dict[str, Tensor] | None

返回节点属性字典。

Returns:: 如果存在，返回节点属性的字典。每个键代表属性的名称，而对应的值包含属性的具体值。每个值的长度应与节点的总数匹配。
Return type:: Dict[str, torch.Tensor] 或 None

property node_type_offset: Tensor | None

如果存在，返回节点类型偏移张量。不要就地修改返回的张量。

Returns:: 如果存在，返回一个形状为(num_node_types + 1,)的一维整数张量。张量按升序排列，因为相同类型的节点具有连续的ID，较大的节点ID与较大的节点类型ID配对。第一个值为0，最后一个值为节点数量。ID在node_type_offset_[i]~node_type_offset_[i+1]之间的节点属于类型ID为‘i’。
Return type:: torch.Tensor 或 None

property node_type_to_id: Dict[str, int] | None

如果存在，返回节点类型到ID的字典。

Returns:: 如果存在，返回一个将节点类型映射到节点类型ID的字典。
Return type:: 字典[str, int] 或无

property num_edges: int | Dict[str, int]

图中边的数量。 - 如果图是同质的，返回一个整数。 - 如果图是异质的，返回一个字典。

Returns:: 边的数量。整数表示同质图的总边数；字典表示异质图每种边类型的边数。
Return type:: Union[int, Dict[str, int]]

示例

>>> import dgl.graphbolt as gb, torch
>>> total_num_nodes = 5
>>> total_num_edges = 12
>>> ntypes = {"N0": 0, "N1": 1}
>>> etypes = {"N0:R0:N0": 0, "N0:R1:N1": 1,
...     "N1:R2:N0": 2, "N1:R3:N1": 3}
>>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12])
>>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor(
...     [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3])
>>> metadata = gb.GraphMetadata(ntypes, etypes)
>>> graph = gb.fused_csc_sampling_graph(indptr, indices, node_type_offset,
...     type_per_edge, None, metadata)
>>> print(graph.num_edges)
{'N0:R0:N0': 2, 'N0:R1:N1': 1, 'N1:R2:N0': 2, 'N1:R3:N1': 3}

property num_nodes: int | Dict[str, int]

图中节点的数量。 - 如果图是同质的，返回一个整数。 - 如果图是异质的，返回一个字典。

Returns:: 节点数量。整数表示同质图的节点总数；字典表示异质图中每种节点类型的节点数量。
Return type:: Union[int, Dict[str, int]]

示例

>>> import dgl.graphbolt as gb, torch
>>> total_num_nodes = 5
>>> total_num_edges = 12
>>> ntypes = {"N0": 0, "N1": 1}
>>> etypes = {"N0:R0:N0": 0, "N0:R1:N1": 1,
...     "N1:R2:N0": 2, "N1:R3:N1": 3}
>>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12])
>>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4])
>>> node_type_offset = torch.LongTensor([0, 2, 5])
>>> type_per_edge = torch.LongTensor(
...     [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3])
>>> graph = gb.fused_csc_sampling_graph(indptr, indices,
...     node_type_offset=node_type_offset,
...     type_per_edge=type_per_edge,
...     node_type_to_id=ntypes,
...     edge_type_to_id=etypes)
>>> print(graph.num_nodes)
{'N0': 2, 'N1': 3}

property total_num_edges: int

返回图中的边数。

Returns:: 图中的边数。
Return type:: int

property total_num_nodes: int

返回图中的节点数量。

Returns:: 密集格式中的行数。
Return type:: int

property type_per_edge: Tensor | None

如果存在，返回边缘类型张量。

Returns:: 如果存在，返回一个形状为 (total_num_edges,) 的一维整数张量，包含图中每条边的类型。
Return type:: torch.Tensor 或 None