torch_geometric.data.InMemoryDataset
- class InMemoryDataset(root: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, log: bool = True, force_reload: bool = False)[source]
Bases:
Dataset用于创建易于适应CPU内存的图形数据集的Dataset基类。 请参阅这里以获取相关教程。
- Parameters:
root (str, optional) – Root directory where the dataset should be saved. (optional:
None)transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)pre_filter (callable, optional) – A function that takes in a
DataorHeteroDataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None)log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default:
True)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False)
- property raw_file_names: Union[str, List[str], Tuple[str, ...]]
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
- property processed_file_names: Union[str, List[str], Tuple[str, ...]]
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- classmethod save(data_list: Sequence[BaseData], path: str) None[source]
将数据对象列表保存到文件路径
path。- Return type:
- load(path: str, data_cls: ~typing.Type[~torch_geometric.data.data.BaseData] = <class 'torch_geometric.data.data.Data'>) None[source]
从文件路径
path加载数据集。- Return type:
- static collate(data_list: Sequence[BaseData]) Tuple[BaseData, Optional[Dict[str, Tensor]]][source]
整理一个
Data或HeteroData对象的列表到InMemoryDataset的内部存储格式。
- copy(idx: Optional[Union[slice, Tensor, ndarray, Sequence]] = None) InMemoryDataset[source]
执行数据集的深拷贝。如果未提供
idx,将克隆整个数据集。否则,将仅克隆从索引idx开始的数据集子集。索引可以是切片、列表、元组,以及torch.Tensor或np.ndarray类型的long或bool。- Return type:
- to_on_disk_dataset(root: Optional[str] = None, backend: str = 'sqlite', log: bool = True) OnDiskDataset[source]
将
InMemoryDataset转换为OnDiskDataset变体。对于分布式训练和共享内存有限的硬件实例非常有用。- root (str, optional): Root directory where the dataset should be saved.
如果设置为
None,将会把数据集保存在root/on_disk。 请注意,指定root以考虑不同的数据集分割是很重要的。(可选:None)- backend (str): The
Databasebackend to use. (默认:
"sqlite")- log (bool, optional): Whether to print any console output while
处理数据集。(默认值:
True)
- Return type:
- cpu(*args: str) InMemoryDataset[source]
将数据集移动到CPU内存。
- Return type:
- property has_download: bool
Checks whether the dataset defines a
download()method.
- index_select(idx: Union[slice, Tensor, ndarray, Sequence]) 数据集
Creates a subset of the dataset from specified indices
idx. Indicesidxcan be a slicing object, e.g.,[2:5], a list, a tuple, or atorch.Tensorornp.ndarrayof type long or bool.- Return type:
- property num_features: int
Returns the number of features per node in the dataset. Alias for
num_node_features.- Return type:
- to_datapipe() Any
Converts the dataset into a
torch.utils.data.DataPipe.The returned instance can then be used with PyG’s built-in
DataPipesfor batching graphs as follows:from torch_geometric.datasets import QM9 dp = QM9(root='./data/QM9/').to_datapipe() dp = dp.batch_graphs(batch_size=2, drop_last=True) for batch in dp: pass
See the PyTorch tutorial for further background on DataPipes.
- Return type: