torch_geometric.datasets.TUDataset

class TUDataset(root: str, name: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, force_reload: bool = False, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False)[source]

Bases: InMemoryDataset

多种图核基准数据集,例如"IMDB-BINARY", "REDDIT-BINARY""PROTEINS", 收集自 TU Dortmund University。 此外,该数据集包装器提供了 清理后的数据集版本,这是由 “Understanding Isomorphism Bias in Graph Data Sets” 论文所推动的,仅包含非同构的图。

注意

有些数据集可能不包含任何节点标签。 你可以使用参数use_node_attr 来加载额外的连续节点属性(如果存在),或者使用转换提供 合成节点特征,例如 torch_geometric.transforms.Constanttorch_geometric.transforms.OneHotDegree

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • name (str) – 数据集的名称

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

  • use_node_attr (bool, 可选) – 如果为 True,数据集将包含额外的连续节点属性(如果存在)。 (默认: False)

  • use_edge_attr (bool, 可选) – 如果为 True,数据集将包含额外的连续边属性(如果存在)。 (默认: False)

  • cleaned (bool, optional) – 如果 True,数据集将仅包含非同构图。(默认值:False

统计:

名称

#图表

#节点

#edges

#特性

#classes

MUTAG

188

~17.9

~39.6

7

2

600

~32.6

~124.3

3

6

蛋白质

1,113

~39.1

~145.6

3

2

协作

5,000

~74.5

~4914.4

0

3

IMDB-BINARY

1,000

~19.8

~193.1

0

2

REDDIT-BINARY

2,000

~429.6

~995.5

0

2