欺诈数据集
- class dgl.data.FraudDataset(name, raw_dir=None, random_seed=717, train_size=0.7, val_size=0.1, force_reload=False, verbose=True, transform=None)[source]
Bases:
DGLBuiltinDataset
欺诈节点预测数据集。
数据集包括从Yelp和Amazon提取的两个多关系图,其中节点代表欺诈性评论或欺诈性评论者。
它首次在CIKM’20的一篇论文<https://arxiv.org/pdf/2008.08692.pdf>中被提出,并且 被最近的一篇WWW’21论文<https://ponderly.github.io/pub/PCGNN_WWW2021.pdf> 用作基准。另一篇论文<https://arxiv.org/pdf/2104.01404.pdf>也 将该数据集作为研究非同质图的例子。该数据集基于工业数据构建,具有丰富的关系信息和独特的属性,如 类别不平衡和特征不一致,这使得该数据集成为研究GNN在现实世界噪声图上表现的良好实例。这些图是双向的 并且不自连接。
参考:<https://github.com/YingtongDou/CARE-GNN>
- Parameters:
name (str) – Name of the dataset
raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/
random_seed (int) – Specifying the random seed in splitting the dataset. Default: 717
train_size (float) – training set size of the dataset. Default: 0.7
val_size (float) – validation set size of the dataset, and the size of testing set is (1 - train_size - val_size) Default: 0.1
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: True.
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
- graph
图结构等。
- Type:
示例
>>> dataset = FraudDataset('yelp') >>> graph = dataset[0] >>> num_classes = dataset.num_classes >>> feat = graph.ndata['feature'] >>> label = graph.ndata['label']
- __getitem__(idx)[source]
获取图形对象
- Parameters:
idx (int) – Item index
- Returns:
graph structure, node features, node labels and masks
ndata['feature']
: node featuresndata['label']
: node labelsndata['train_mask']
: mask of training setndata['val_mask']
: mask of validation setndata['test_mask']
: mask of testing set
- Return type: