欺诈数据集

class dgl.data.FraudDataset(name, raw_dir=None, random_seed=717, train_size=0.7, val_size=0.1, force_reload=False, verbose=True, transform=None)[source]

Bases: DGLBuiltinDataset

欺诈节点预测数据集。

数据集包括从Yelp和Amazon提取的两个多关系图,其中节点代表欺诈性评论或欺诈性评论者。

它首次在CIKM’20的一篇论文<https://arxiv.org/pdf/2008.08692.pdf>中被提出,并且 被最近的一篇WWW’21论文<https://ponderly.github.io/pub/PCGNN_WWW2021.pdf> 用作基准。另一篇论文<https://arxiv.org/pdf/2104.01404.pdf>也 将该数据集作为研究非同质图的例子。该数据集基于工业数据构建,具有丰富的关系信息和独特的属性,如 类别不平衡和特征不一致,这使得该数据集成为研究GNN在现实世界噪声图上表现的良好实例。这些图是双向的 并且不自连接。

参考:<https://github.com/YingtongDou/CARE-GNN>

Parameters:
  • name (str) – Name of the dataset

  • raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/

  • random_seed (int) – Specifying the random seed in splitting the dataset. Default: 717

  • train_size (float) – training set size of the dataset. Default: 0.7

  • val_size (float) – validation set size of the dataset, and the size of testing set is (1 - train_size - val_size) Default: 0.1

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_classes

标签类别数量

Type:

int

graph

图结构等。

Type:

dgl.DGLGraph

seed

在分割数据集时使用的随机种子。

Type:

int

train_size

数据集的训练集大小。

Type:

float

val_size

数据集的验证集大小

Type:

float

示例

>>> dataset = FraudDataset('yelp')
>>> graph = dataset[0]
>>> num_classes = dataset.num_classes
>>> feat = graph.ndata['feature']
>>> label = graph.ndata['label']
__getitem__(idx)[source]

获取图形对象

Parameters:

idx (int) – Item index

Returns:

graph structure, node features, node labels and masks

  • ndata['feature']: node features

  • ndata['label']: node labels

  • ndata['train_mask']: mask of training set

  • ndata['val_mask']: mask of validation set

  • ndata['test_mask']: mask of testing set

Return type:

dgl.DGLGraph

__len__()[source]

数据示例的数量