欺诈数据集

class dgl.data.FraudDataset(name, raw_dir=None, random_seed=717, train_size=0.7, val_size=0.1, force_reload=False, verbose=True, transform=None)[source]

Bases: DGLBuiltinDataset

欺诈节点预测数据集。

数据集包括从Yelp和Amazon提取的两个多关系图，其中节点代表欺诈性评论或欺诈性评论者。

它首次在CIKM’20的一篇论文<https://arxiv.org/pdf/2008.08692.pdf>中被提出，并且被最近的一篇WWW’21论文<https://ponderly.github.io/pub/PCGNN_WWW2021.pdf> 用作基准。另一篇论文<https://arxiv.org/pdf/2104.01404.pdf>也将该数据集作为研究非同质图的例子。该数据集基于工业数据构建，具有丰富的关系信息和独特的属性，如类别不平衡和特征不一致，这使得该数据集成为研究GNN在现实世界噪声图上表现的良好实例。这些图是双向的并且不自连接。

参考：<https://github.com/YingtongDou/CARE-GNN>

Parameters:

name (str) – Name of the dataset
raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/
random_seed (int) – Specifying the random seed in splitting the dataset. Default: 717
train_size (float) – training set size of the dataset. Default: 0.7
val_size (float) – validation set size of the dataset, and the size of testing set is (1 - train_size - val_size) Default: 0.1
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: True.
transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_classes

标签类别数量

Type:: int

graph

图结构等。

Type:: dgl.DGLGraph

seed

在分割数据集时使用的随机种子。

Type:: int

train_size

数据集的训练集大小。

Type:: float

val_size

数据集的验证集大小

Type:: float

示例

>>> dataset = FraudDataset('yelp')
>>> graph = dataset[0]
>>> num_classes = dataset.num_classes
>>> feat = graph.ndata['feature']
>>> label = graph.ndata['label']

__getitem__(idx)[source]

获取图形对象

Parameters:

idx (int) – Item index

Returns:

graph structure, node features, node labels and masks

ndata['feature']: node features
ndata['label']: node labels
ndata['train_mask']: mask of training set
ndata['val_mask']: mask of validation set
ndata['test_mask']: mask of testing set

Return type:

dgl.DGLGraph

__len__()[source]: 数据示例的数量