torch_geometric.datasets.ZINC
- class ZINC(root: str, subset: bool = False, split: str = 'train', transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, force_reload: bool = False)[source]
Bases:
InMemoryDataset来自ZINC数据库的ZINC数据集和“使用数据驱动的分子连续表示进行自动化学设计”论文,包含约250,000个分子图,最多有38个重原子。任务是回归惩罚的
logP(在某些工作中也称为约束溶解度),由y = logP - SAS - cycles给出,其中logP是水-辛醇分配系数,SAS是合成可及性评分,cycles表示超过六个原子的环数。惩罚的logP是常用于训练分子生成模型的评分,参见,例如,“用于分子图生成的连接树变分自编码器”和“语法变分自编码器”论文。- Parameters:
root (str) – Root directory where the dataset should be saved.
subset (bool, 可选) – 如果设置为
True,将仅加载数据集的一个子集(12,000个分子图),遵循 “Benchmarking Graph Neural Networks” 论文。(默认:False)split (str, optional) – If
"train", loads the training dataset. If"val", loads the validation dataset. If"test", loads the test dataset. (default:"train")transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Dataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False)
统计:
名称
#图表
#节点
#edges
#特性
#classes
ZINC 完整版
249,456
~23.2
~49.8
1
1
ZINC子集
12,000
~23.2
~49.8
1
1