torch_geometric.data.OnDiskDataset
- class OnDiskDataset(root: str, transform: ~typing.Optional[~typing.Callable] = None, pre_filter: ~typing.Optional[~typing.Callable] = None, backend: str = 'sqlite', schema: ~typing.Union[~typing.Any, ~typing.Dict[str, ~typing.Any], ~typing.Tuple[~typing.Any], ~typing.List[~typing.Any]] = <class 'object'>, log: bool = True)[source]
Bases:
Dataset用于创建大型图数据集的基类,这些数据集不容易一次性放入CPU内存中,通过利用
Database后端进行磁盘存储和数据对象的访问。- Parameters:
root (str) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_filter (callable, optional) – A function that takes in a
DataorHeteroDataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None)backend (str) – 要使用的
Database后端 (可以是"sqlite"或"rocksdb")。 (默认:"sqlite")schema (Any 或 Tuple[Any] 或 Dict[str, Any], optional) – 输入数据的模式。 可以接受
int,float,str,object, 或者一个带有dtype和size键的字典(用于指定张量数据)作为输入,并且可以嵌套为元组或字典。 指定模式将提高效率,因为默认情况下数据库将使用 python 的 pickle 进行序列化和反序列化。如果指定为不同于object的内容,OnDiskDataset的实现需要覆盖serialize()和deserialize()方法。 (默认:object)log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default:
True)
- property processed_file_names: str
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.- Return type:
- deserialize(data: Any) BaseData[source]
将数据库条目反序列化为
Data或HeteroData对象。- Return type:
BaseData
- extend(data_list: Sequence[BaseData], batch_size: Optional[int] = None) None[source]
通过一系列数据对象扩展数据集。
- Return type:
- multi_get(indices: Union[Iterable[int], Tensor, slice, range], batch_size: Optional[int] = None) List[BaseData][source]
从指定的索引中获取数据对象列表。
- Return type:
List[BaseData]
- property has_download: bool
Checks whether the dataset defines a
download()method.- Return type:
- index_select(idx: Union[slice, Tensor, ndarray, Sequence]) 数据集
从指定的索引
idx创建数据集的子集。 索引idx可以是一个切片对象,例如,[2:5], 一个列表,一个元组,或者一个torch.Tensor或np.ndarray类型的 长整型或布尔型。- Return type:
- property num_features: int
Returns the number of features per node in the dataset. Alias for
num_node_features.- Return type:
- property raw_file_names: Union[str, List[str], Tuple[str, ...]]
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
- to_datapipe() Any
Converts the dataset into a
torch.utils.data.DataPipe.The returned instance can then be used with PyG’s built-in
DataPipesfor batching graphs as follows:from torch_geometric.datasets import QM9 dp = QM9(root='./data/QM9/').to_datapipe() dp = dp.batch_graphs(batch_size=2, drop_last=True) for batch in dp: pass
See the PyTorch tutorial for further background on DataPipes.
- Return type: