FB15k数据集

class dgl.data.FB15kDataset(reverse=True, raw_dir=None, force_reload=False, verbose=True, transform=None)[source]

Bases: KnowledgeGraphDataset

FB15k链接预测数据集。

FB15K数据集在《用于建模多关系数据的嵌入翻译》中被引入。 它是Freebase的一个子集,包含大约14,951个实体和1,345种不同的关系。 在创建数据集时,默认情况下会为每条边创建一个具有反向关系类型的反向边。

FB15k 数据集统计:

  • 节点数:14,951

  • 关系类型的数量:1,345

  • 反向关系类型的数量:1,345

  • 标签分割:

    • 列车: 483142

    • 有效:50000

    • 测试: 59071

Parameters:
  • reverse (bool) – Whether to add reverse edge. Default True.

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_nodes

节点数量

Type:

int

num_rels

关系类型的数量

Type:

int

示例

>>> dataset = FB15kDataset()
>>> g = dataset.graph
>>> e_type = g.edata['e_type']
>>>
>>> # get data split
>>> train_mask = g.edata['train_mask']
>>> val_mask = g.edata['val_mask']
>>>
>>> train_set = th.arange(g.num_edges())[train_mask]
>>> val_set = th.arange(g.num_edges())[val_mask]
>>>
>>> # build train_g
>>> train_edges = train_set
>>> train_g = g.edge_subgraph(train_edges,
                              relabel_nodes=False)
>>> train_g.edata['e_type'] = e_type[train_edges];
>>>
>>> # build val_g
>>> val_edges = th.cat([train_edges, val_edges])
>>> val_g = g.edge_subgraph(val_edges,
                            relabel_nodes=False)
>>> val_g.edata['e_type'] = e_type[val_edges];
>>>
>>> # Train, Validation and Test
>>>
__getitem__(idx)[source]

获取图形对象

Parameters:

idx (int) – 项目索引,FB15kDataset 只有一个图对象

Returns:

The graph contains

  • edata['e_type']: edge relation type

  • edata['train_edge_mask']: positive training edge mask

  • edata['val_edge_mask']: positive validation edge mask

  • edata['test_edge_mask']: positive testing edge mask

  • edata['train_mask']: training edge set mask (include reversed training edges)

  • edata['val_mask']: validation edge set mask (include reversed validation edges)

  • edata['test_mask']: testing edge set mask (include reversed testing edges)

  • ndata['ntype']: node type. All 0 in this dataset

Return type:

dgl.DGLGraph

__len__()[source]

数据集中图的数量。