FusedCSCSamplingGraph
- class dgl.graphbolt.FusedCSCSamplingGraph(c_csc_graph: ScriptObject)[source]
基础类:
SamplingGraph
CSC格式的采样图。
将图表复制到共享内存中。
- Parameters:
shared_memory_name (str) – Name of the shared memory.
- Returns:
共享内存上复制的FusedCSCSamplingGraph对象。
- Return type:
- in_subgraph(nodes: Tensor | Dict[str, Tensor]) SampledSubgraphImpl [source]
返回由给定节点的入边诱导的子图。
子图中的 in 等同于使用给定节点的传入边创建一个新图。子图根据传入的 nodes 的顺序进行压缩。
- Parameters:
nodes (torch.Tensor 或 Dict[str, torch.Tensor]) –
- 给定种子节点的ID。
如果 nodes 是一个张量:它表示图是同质图,里面的ID是同质ID。
如果 nodes 是一个字典:键应该是节点类型,里面的ID是异质ID。
- Returns:
子图中的in。
- Return type:
示例
>>> import dgl.graphbolt as gb >>> import torch >>> total_num_nodes = 5 >>> total_num_edges = 12 >>> ntypes = {"N0": 0, "N1": 1} >>> etypes = { ... "N0:R0:N0": 0, "N0:R1:N1": 1, "N1:R2:N0": 2, "N1:R3:N1": 3} >>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12]) >>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4]) >>> node_type_offset = torch.LongTensor([0, 2, 5]) >>> type_per_edge = torch.LongTensor( ... [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3]) >>> graph = gb.fused_csc_sampling_graph(indptr, indices, ... node_type_offset=node_type_offset, ... type_per_edge=type_per_edge, ... node_type_to_id=ntypes, ... edge_type_to_id=etypes) >>> nodes = {"N0":torch.LongTensor([1]), "N1":torch.LongTensor([1, 2])} >>> in_subgraph = graph.in_subgraph(nodes) >>> print(in_subgraph.sampled_csc) {'N0:R0:N0': CSCFormatBase(indptr=tensor([0, 0]), indices=tensor([], dtype=torch.int64), ), 'N0:R1:N1': CSCFormatBase(indptr=tensor([0, 1, 2]), indices=tensor([1, 0]), ), 'N1:R2:N0': CSCFormatBase(indptr=tensor([0, 2]), indices=tensor([0, 1]), ), 'N1:R3:N1': CSCFormatBase(indptr=tensor([0, 1, 3]), indices=tensor([0, 1, 2]), )}
- sample_layer_neighbors(seeds: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None, random_seed: Tensor | None = None, seed2_contribution: float = 0.0) SampledSubgraphImpl [source]
对给定节点的邻近边进行采样,并通过从NeurIPS 2023论文中的层邻居采样返回诱导子图 Layer-Neighbor Sampling – Defusing Neighborhood Explosion in GNNs
- Parameters:
seeds (torch.Tensor 或 Dict[str, torch.Tensor]) –
- 给定种子节点的ID。
如果 nodes 是一个张量:这意味着图是同质图,里面的ID是同质ID。
如果 nodes 是一个字典:键应该是节点类型,里面的ID是异质ID。
fanouts (torch.Tensor) –
每个节点要采样的边数,考虑或不考虑边类型。
当长度为1时,表示fanout适用于节点的所有邻居,不考虑边类型。
否则,长度应等于边类型的数量,每个fanout值对应于节点的特定边类型。
- 每个fanout的值应>= 0或= -1。
当值为-1时,所有邻居(如果有权重,则非零概率)将被采样一次,不考虑替换。这相当于当fanout >= 邻居数量时(并且replace设置为false),选择所有非零概率的邻居。
当值为非负整数时,它作为选择邻居的最小阈值。
replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.
probs_name (str, optional) – 一个可选的字符串,用于指定边属性的名称。此属性张量应包含与节点的每个相邻边对应的(未归一化的)概率。它必须是一个1D浮点或布尔张量,元素数量等于边的总数。
random_seed (torch.Tensor, optional) –
一个包含一个或两个元素的int64张量。
传入的random_seed使得对于任何种子节点
s
和其邻居t
,在相同随机种子的情况下,每次调用此函数时,生成的随机变量r_t
都是相同的。当作为同一批次的一部分进行采样时,需要相同的种子以便LABOR可以全局采样。一个例子是对于异构图,每种边类型都会传入一个随机种子。与为每种边类型使用唯一随机种子相比,这将采样更少的节点。如果对于具有不同随机种子的异构图,每种边类型分别调用此函数,则LABOR将在每种边类型上本地运行,导致采样的节点数量更多。如果调用此函数时没有
random_seed
,则通过从GraphBolt获取一个随机数来获得随机种子。如果多次调用此函数作为单个批次的一部分进行采样,请使用相同的random_seed参数。如果给出两个数字,则
seed2_contribution
参数决定两个随机种子之间的插值。seed2_contribution (float, optional) – 一个介于 [0, 1) 之间的浮点值,用于确定第二个随机种子
random_seed[-1]
对生成随机变量的贡献。
- Returns:
采样的子图。
- Return type:
示例
>>> import dgl.graphbolt as gb >>> import torch >>> ntypes = {"n1": 0, "n2": 1} >>> etypes = {"n1:e1:n2": 0, "n2:e2:n1": 1} >>> indptr = torch.LongTensor([0, 2, 4, 6, 7, 9]) >>> indices = torch.LongTensor([2, 4, 2, 3, 0, 1, 1, 0, 1]) >>> node_type_offset = torch.LongTensor([0, 2, 5]) >>> type_per_edge = torch.LongTensor([1, 1, 1, 1, 0, 0, 0, 0, 0]) >>> graph = gb.fused_csc_sampling_graph(indptr, indices, ... node_type_offset=node_type_offset, ... type_per_edge=type_per_edge, ... node_type_to_id=ntypes, ... edge_type_to_id=etypes) >>> nodes = {'n1': torch.LongTensor([0]), 'n2': torch.LongTensor([0])} >>> fanouts = torch.tensor([1, 1]) >>> subgraph = graph.sample_layer_neighbors(nodes, fanouts) >>> print(subgraph.sampled_csc) {'n1:e1:n2': CSCFormatBase(indptr=tensor([0, 1]), indices=tensor([0]), ), 'n2:e2:n1': CSCFormatBase(indptr=tensor([0, 1]), indices=tensor([2]), )}
- sample_negative_edges_uniform(edge_type, node_pairs, negative_ratio)[source]
通过均匀分布随机选择负源-目标边来采样负边。对于每条边
(u, v)
,它应该生成 negative_ratio 对负边(u, v')
,其中v'
是从图中的所有节点中均匀选择的。u
与相应的正边完全相同。它返回正边与负边连接的结果。在负边中,负源是从相应的正边构建的。- Parameters:
- Returns:
一个2D张量表示N对正负源-目标节点对。在异质图的上下文中,输入节点和选定的节点都由异质ID表示,形成的边是输入类型edge_type。请注意,负样本指的是假阴性,这意味着边可能在图中存在或不存在。
- Return type:
torch.Tensor
- sample_neighbors(seeds: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None) SampledSubgraphImpl [source]
对给定节点的邻近边进行采样,并返回诱导子图。
- Parameters:
seeds (torch.Tensor or Dict[str, torch.Tensor]) –
- IDs of the given seed nodes.
If nodes is a tensor: It means the graph is homogeneous graph, and ids inside are homogeneous ids.
If nodes is a dictionary: The keys should be node type and ids inside are heterogeneous ids.
fanouts (torch.Tensor) –
The number of edges to be sampled for each node with or without considering edge types.
When the length is 1, it indicates that the fanout applies to all neighbors of the node as a collective, regardless of the edge type.
Otherwise, the length should equal to the number of edge types, and each fanout value corresponds to a specific edge type of the nodes.
- The value of each fanout should be >= 0 or = -1.
When the value is -1, all neighbors (with non-zero probability, if weighted) will be sampled once regardless of replacement. It is equivalent to selecting all neighbors with non-zero probability when the fanout is >= the number of neighbors (and replace is set to false).
When the value is a non-negative integer, it serves as a minimum threshold for selecting neighbors.
replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.
probs_name (str, optional) – 一个可选的字符串,指定使用的边属性的名称。 该属性张量应包含与节点的每个相邻边对应的(未归一化的)概率。 它必须是一个1D浮点或布尔张量,元素数量等于边的总数。
- Returns:
采样的子图。
- Return type:
示例
>>> import dgl.graphbolt as gb >>> import torch >>> ntypes = {"n1": 0, "n2": 1} >>> etypes = {"n1:e1:n2": 0, "n2:e2:n1": 1} >>> indptr = torch.LongTensor([0, 2, 4, 6, 7, 9]) >>> indices = torch.LongTensor([2, 4, 2, 3, 0, 1, 1, 0, 1]) >>> node_type_offset = torch.LongTensor([0, 2, 5]) >>> type_per_edge = torch.LongTensor([1, 1, 1, 1, 0, 0, 0, 0, 0]) >>> graph = gb.fused_csc_sampling_graph(indptr, indices, ... node_type_offset=node_type_offset, ... type_per_edge=type_per_edge, ... node_type_to_id=ntypes, ... edge_type_to_id=etypes) >>> nodes = {'n1': torch.LongTensor([0]), 'n2': torch.LongTensor([0])} >>> fanouts = torch.tensor([1, 1]) >>> subgraph = graph.sample_neighbors(nodes, fanouts) >>> print(subgraph.sampled_csc) {'n1:e1:n2': CSCFormatBase(indptr=tensor([0, 1]), indices=tensor([0]), ), 'n2:e2:n1': CSCFormatBase(indptr=tensor([0, 1]), indices=tensor([2]), )}
- temporal_sample_neighbors(nodes: Tensor | Dict[str, Tensor], input_nodes_timestamp: Tensor | Dict[str, Tensor], fanouts: Tensor, replace: bool = False, probs_name: str | None = None, node_timestamp_attr_name: str | None = None, edge_timestamp_attr_name: str | None = None) ScriptObject [source]
对给定节点的相邻边进行时间采样,并返回诱导子图。
如果给出了node_timestamp_attr_name或edge_timestamp_attr_name, 输入节点的采样邻居或边必须具有比输入节点更早的时间戳。
- Parameters:
nodes (torch.Tensor) – 给定种子节点的ID。
input_nodes_timestamp (torch.Tensor) – 给定种子节点的时间戳。
fanouts (torch.Tensor) –
The number of edges to be sampled for each node with or without considering edge types.
When the length is 1, it indicates that the fanout applies to all neighbors of the node as a collective, regardless of the edge type.
Otherwise, the length should equal to the number of edge types, and each fanout value corresponds to a specific edge type of the nodes.
- The value of each fanout should be >= 0 or = -1.
When the value is -1, all neighbors (with non-zero probability, if weighted) will be sampled once regardless of replacement. It is equivalent to selecting all neighbors with non-zero probability when the fanout is >= the number of neighbors (and replace is set to false).
When the value is a non-negative integer, it serves as a minimum threshold for selecting neighbors.
replace (bool) – Boolean indicating whether the sample is preformed with or without replacement. If True, a value can be selected multiple times. Otherwise, each value can be selected only once.
probs_name (str, optional) – An optional string specifying the name of an edge attribute. This attribute tensor should contain (unnormalized) probabilities corresponding to each neighboring edge of a node. It must be a 1D floating-point or boolean tensor, with the number of elements equalling the total number of edges.
node_timestamp_attr_name (str, optional) – 一个可选的字符串,指定节点属性的名称。
edge_timestamp_attr_name (str, optional) – 一个可选的字符串,用于指定边属性的名称。
- Returns:
采样的子图。
- Return type:
- property csc_indptr: tensor
返回CSC图中的索引指针。
- Returns:
CSC图中的索引指针。一个形状为(total_num_nodes+1,)的整数张量。
- Return type:
torch.tensor
- property edge_attributes: Dict[str, Tensor] | None
返回边缘属性字典。
- Returns:
如果存在,返回一个边属性的字典。每个键代表属性的名称,而对应的值包含属性的具体值。每个值的长度应与边的总数匹配。
- Return type:
字典[str, torch.Tensor] 或 无
- property indices: tensor
返回CSC图中的索引。
- Returns:
CSC图中的索引。一个形状为(total_num_edges,)的整数张量。
- Return type:
torch.tensor
注释
假设每个节点的边已经按照边类型ID排序。
- property node_attributes: Dict[str, Tensor] | None
返回节点属性字典。
- Returns:
如果存在,返回节点属性的字典。每个键代表属性的名称,而对应的值包含属性的具体值。每个值的长度应与节点的总数匹配。
- Return type:
Dict[str, torch.Tensor] 或 None
- property node_type_offset: Tensor | None
如果存在,返回节点类型偏移张量。不要就地修改返回的张量。
- Returns:
如果存在,返回一个形状为(num_node_types + 1,)的一维整数张量。张量按升序排列,因为相同类型的节点具有连续的ID,较大的节点ID与较大的节点类型ID配对。第一个值为0,最后一个值为节点数量。ID在node_type_offset_[i]~node_type_offset_[i+1]之间的节点属于类型ID为‘i’。
- Return type:
torch.Tensor 或 None
- property num_edges: int | Dict[str, int]
图中边的数量。 - 如果图是同质的,返回一个整数。 - 如果图是异质的,返回一个字典。
示例
>>> import dgl.graphbolt as gb, torch >>> total_num_nodes = 5 >>> total_num_edges = 12 >>> ntypes = {"N0": 0, "N1": 1} >>> etypes = {"N0:R0:N0": 0, "N0:R1:N1": 1, ... "N1:R2:N0": 2, "N1:R3:N1": 3} >>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12]) >>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4]) >>> node_type_offset = torch.LongTensor([0, 2, 5]) >>> type_per_edge = torch.LongTensor( ... [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3]) >>> metadata = gb.GraphMetadata(ntypes, etypes) >>> graph = gb.fused_csc_sampling_graph(indptr, indices, node_type_offset, ... type_per_edge, None, metadata) >>> print(graph.num_edges) {'N0:R0:N0': 2, 'N0:R1:N1': 1, 'N1:R2:N0': 2, 'N1:R3:N1': 3}
- property num_nodes: int | Dict[str, int]
图中节点的数量。 - 如果图是同质的,返回一个整数。 - 如果图是异质的,返回一个字典。
示例
>>> import dgl.graphbolt as gb, torch >>> total_num_nodes = 5 >>> total_num_edges = 12 >>> ntypes = {"N0": 0, "N1": 1} >>> etypes = {"N0:R0:N0": 0, "N0:R1:N1": 1, ... "N1:R2:N0": 2, "N1:R3:N1": 3} >>> indptr = torch.LongTensor([0, 3, 5, 7, 9, 12]) >>> indices = torch.LongTensor([0, 1, 4, 2, 3, 0, 1, 1, 2, 0, 3, 4]) >>> node_type_offset = torch.LongTensor([0, 2, 5]) >>> type_per_edge = torch.LongTensor( ... [0, 0, 2, 2, 2, 1, 1, 1, 3, 1, 3, 3]) >>> graph = gb.fused_csc_sampling_graph(indptr, indices, ... node_type_offset=node_type_offset, ... type_per_edge=type_per_edge, ... node_type_to_id=ntypes, ... edge_type_to_id=etypes) >>> print(graph.num_nodes) {'N0': 2, 'N1': 3}