torch_geometric.datasets.ZINC

class ZINC(root: str, subset: bool = False, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, force_reload: bool = False)[source]

Bases: InMemoryDataset

来自ZINC数据库的ZINC数据集和“使用数据驱动的分子连续表示进行自动化学设计”论文,包含约250,000个分子图,最多有38个重原子。任务是回归惩罚的logP(在某些工作中也称为约束溶解度),由y = logP - SAS - cycles给出,其中logP是水-辛醇分配系数,SAS是合成可及性评分,cycles表示超过六个原子的环数。惩罚的logP是常用于训练分子生成模型的评分,参见,例如“用于分子图生成的连接树变分自编码器”“语法变分自编码器”论文。

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • subset (bool, 可选) – 如果设置为 True,将仅加载数据集的一个子集(12,000个分子图),遵循 “Benchmarking Graph Neural Networks” 论文。(默认: False)

  • split (str, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

统计:

名称

#图表

#节点

#edges

#特性

#classes

ZINC 完整版

249,456

~23.2

~49.8

1

1

ZINC子集

12,000

~23.2

~49.8

1

1