torch_geometric.data.InMemoryDataset

class InMemoryDataset(root: Optional[str] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, pre_filter: Optional[Callable] = None, log: bool = True, force_reload: bool = False)[source]

Bases: Dataset

用于创建易于适应CPU内存的图形数据集的Dataset基类。 请参阅这里以获取相关教程。

Parameters:
  • root (str, optional) – Root directory where the dataset should be saved. (optional: None)

  • transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in a Data or HeteroData object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default: True)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

property raw_file_names: Union[str, List[str], Tuple[str, ...]]

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

Return type:

Union[str, List[str], Tuple[str, ...]]

property processed_file_names: Union[str, List[str], Tuple[str, ...]]

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

Return type:

Union[str, List[str], Tuple[str, ...]]

property num_classes: int

返回数据集中类的数量。

Return type:

int

len() int[source]

返回数据集中存储的数据对象的数量。

Return type:

int

get(idx: int) BaseData[source]

Gets the data object at index idx.

Return type:

BaseData

classmethod save(data_list: Sequence[BaseData], path: str) None[source]

将数据对象列表保存到文件路径 path

Return type:

None

load(path: str, data_cls: ~typing.Type[~torch_geometric.data.data.BaseData] = <class 'torch_geometric.data.data.Data'>) None[source]

从文件路径 path 加载数据集。

Return type:

None

static collate(data_list: Sequence[BaseData]) Tuple[BaseData, Optional[Dict[str, Tensor]]][source]

整理一个DataHeteroData对象的列表到InMemoryDataset的内部存储格式。

Return type:

Tuple[BaseData, Optional[Dict[str, Tensor]]]

copy(idx: Optional[Union[slice, Tensor, ndarray, Sequence]] = None) InMemoryDataset[source]

执行数据集的深拷贝。如果未提供idx,将克隆整个数据集。否则,将仅克隆从索引idx开始的数据集子集。索引可以是切片、列表、元组,以及torch.Tensornp.ndarray类型的long或bool。

Return type:

InMemoryDataset

to_on_disk_dataset(root: Optional[str] = None, backend: str = 'sqlite', log: bool = True) OnDiskDataset[source]

InMemoryDataset转换为OnDiskDataset变体。对于分布式训练和共享内存有限的硬件实例非常有用。

root (str, optional): Root directory where the dataset should be saved.

如果设置为None,将会把数据集保存在 root/on_disk。 请注意,指定root以考虑不同的数据集分割是很重要的。(可选:None

backend (str): The Database backend to use.

(默认: "sqlite")

log (bool, optional): Whether to print any console output while

处理数据集。(默认值:True

Return type:

OnDiskDataset

to(device: Union[int, str]) InMemoryDataset[source]

对整个数据集执行设备转换。

Return type:

InMemoryDataset

cpu(*args: str) InMemoryDataset[source]

将数据集移动到CPU内存。

Return type:

InMemoryDataset

cuda(device: Optional[Union[int, str]] = None) InMemoryDataset[source]

将数据集移动到CUDA内存。

Return type:

InMemoryDataset

download() None

Downloads the dataset to the self.raw_dir folder.

Return type:

None

get_summary() Any

收集数据集的汇总统计信息。

Return type:

Any

property has_download: bool

Checks whether the dataset defines a download() method.

Return type:

bool 翻译后的内容: bool 在这个例子中,`bool` 是一个Python函数名称,根据翻译规则1,不需要翻译。因此,翻译后的内容保持不变。

property has_process: bool

Checks whether the dataset defines a process() method.

Return type:

bool 翻译后的内容: bool 在这个例子中,`bool` 是一个Python函数名称,根据翻译规则1,不需要翻译。因此,翻译后的内容保持不变。

index_select(idx: Union[slice, Tensor, ndarray, Sequence]) 数据集

Creates a subset of the dataset from specified indices idx. Indices idx can be a slicing object, e.g., [2:5], a list, a tuple, or a torch.Tensor or np.ndarray of type long or bool.

Return type:

数据集

property num_edge_features: int

返回数据集中每条边的特征数量。

Return type:

int

property num_features: int

Returns the number of features per node in the dataset. Alias for num_node_features.

Return type:

int

property num_node_features: int

返回数据集中每个节点的特征数量。

Return type:

int

print_summary(fmt: str = 'psql') None

将数据集的摘要统计信息打印到控制台。

Parameters:

fmt (str, optional) – Summary tables format. Available table formats can be found here. (default: "psql")

Return type:

None

process() None

Processes the dataset to the self.processed_dir folder.

Return type:

None

property processed_paths: List[str]

必须存在的绝对文件路径,以便跳过处理。

Return type:

List[str]

property raw_paths: List[str]

必须存在的绝对文件路径,以便跳过下载。

Return type:

List[str]

shuffle(return_perm: bool = False) Union[数据集, Tuple[数据集, Tensor]]

随机打乱数据集中的示例。

Parameters:

return_perm (bool, optional) – If set to True, will also return the random permutation used to shuffle the dataset. (default: False)

Return type:

Union[Dataset, Tuple[Dataset, Tensor]]

to_datapipe() Any

Converts the dataset into a torch.utils.data.DataPipe.

The returned instance can then be used with built-in DataPipes for batching graphs as follows:

from torch_geometric.datasets import QM9

dp = QM9(root='./data/QM9/').to_datapipe()
dp = dp.batch_graphs(batch_size=2, drop_last=True)

for batch in dp:
    pass

See the PyTorch tutorial for further background on DataPipes.

Return type:

Any