快速入门¶

本教程快速概述了GraphScope的功能特性。首先，我们将使用Python在您的本地机器上安装GraphScope。虽然本指南中的大多数示例基于本地Python环境，但它同样适用于Kubernetes集群。

您可以通过pip轻松安装GraphScope：

python3 -m pip install graphscope -U

注意

我们建议您在干净的Python虚拟环境中使用Python 3.9安装GraphScope，可以选择miniconda或venv。

以venv为例，以下是创建虚拟环境、激活环境并安装GraphScope的逐步指南：

# Create a new virtual environment
python3.9 -m venv tutorial-env

# Activate the virtual environment
source tutorial-env/bin/activate

# Install GraphScope
python3.9 -m pip install graphscope

# Use GraphScope
python3.9
>>> import graphscope as gs
>>> ......

一站式图处理¶

我们将通过一个逐步演示的示例，展示如何使用GraphScope一站式处理各种图计算任务。

该示例针对引用网络上的节点分类任务。

ogbn-mag是由微软学术图谱的一个子集构成的异构图网络。它包含4种类型的实体（即论文、作者、机构和研究领域），以及连接两个实体的四种有向关系。

给定异构的ogbn-mag数据，任务是预测每篇论文的类别。节点分类可以识别多个会议中的论文，这些会议代表了不同主题的科学工作群体。我们同时利用属性和结构信息对论文进行分类。在图中，每篇论文节点包含一个128维的word2vec向量表示其内容，这是通过对其标题和摘要中词语嵌入取平均得到的。单个词语的嵌入是预先训练好的。结构信息则是实时计算的。

GraphScope将图数据建模为属性图，其中边/顶点带有标签并拥有多种属性。以ogbn-mag为例，下图展示了属性图的模型。

该图包含四种类型的顶点，分别标记为paper、author、institution和field_of_study。顶点之间通过四种边相连，每种边都有标签并指定其两端顶点的标签。例如，cites边连接两个标记为paper的顶点；另一个例子是writes边，要求源顶点标记为author，目标顶点为paper顶点。所有顶点和边都可能具有属性，例如paper顶点具有features、publish year、subject label等属性。

交互式查询使用户能够以灵活深入的方式探索、检查并展示图数据，从而快速找到特定信息。GraphScope通过提供对流行查询语言Gremlin和Cypher的支持，增强了交互式查询的呈现效果，并确保这些查询在大规模场景下高效执行。

图分析在现实世界中有着广泛应用。许多算法，如社区检测、路径与连通性、中心性等，已被证明在各种商业场景中非常有用。GraphScope内置了一系列算法，使用户能够轻松分析其图数据。

图神经网络(GNNs)融合了图分析和机器学习的双重优势。GNN算法能够将图中的结构信息和属性信息压缩为每个节点的低维嵌入向量。这些嵌入向量可以进一步输入到下游机器学习任务中。

然后我们定义训练过程并运行它。

图分析任务快速入门¶

安装的graphscope软件包包含在本地机器上分析图所需的一切。如果您有一个需要运行迭代算法的图分析任务，它与graphscope配合良好。

GraphScope 交互式查询快速入门¶

在已安装graphscope包的情况下，您可以轻松地在本地机器上与图数据进行交互。您只需创建interactive实例作为提交Gremlin或Cypher查询的通道。

图学习快速入门¶

使用GraphScope进行GNN模型训练非常简单直观。您可以使用graphscope包在本地机器上训练GNN模型。请注意 tensorflow是运行以下示例所必需的。

Example: Training GraphSAGE Model in GraphScope

import graphscope
from graphscope.dataset import load_ogbn_mag

g = load_ogbn_mag()

# define the features for learning
paper_features = [f"feat_{i}" for i in range(128)]

# launch a learning engine.
lg = graphscope.graphlearn(g, nodes=[("paper", paper_features)],
                  edges=[("paper", "cites", "paper")],
                  gen_labels=[
                      ("train", "paper", 100, (0, 75)),
                      ("val", "paper", 100, (75, 85)),
                      ("test", "paper", 100, (85, 100))
                  ])

try:
    # https://www.tensorflow.org/guide/migrate
    import tensorflow.compat.v1 as tf
    tf.disable_v2_behavior()
except ImportError:
    import tensorflow as tf

import graphscope.learning
from graphscope.learning.examples import EgoGraphSAGE
from graphscope.learning.examples import EgoSAGESupervisedDataLoader
from graphscope.learning.examples.tf.trainer import LocalTrainer

# supervised GraphSAGE
def train_sage(graph, node_type, edge_type, class_num, features_num,
              hops_num=2, nbrs_num=[25, 10], epochs=2,
              hidden_dim=256, in_drop_rate=0.5, learning_rate=0.01,
):
    graphscope.learning.reset_default_tf_graph()

    dimensions = [features_num] + [hidden_dim] * (hops_num - 1) + [class_num]
    model = EgoGraphSAGE(dimensions, act_func=tf.nn.relu, dropout=in_drop_rate)

    # prepare train dataset
    train_data = EgoSAGESupervisedDataLoader(
        graph, graphscope.learning.Mask.TRAIN,
        node_type=node_type, edge_type=edge_type, nbrs_num=nbrs_num, hops_num=hops_num,
    )
    train_embedding = model.forward(train_data.src_ego)
    train_labels = train_data.src_ego.src.labels
    loss = tf.reduce_mean(
        tf.nn.sparse_softmax_cross_entropy_with_logits(
            labels=train_labels, logits=train_embedding,
        )
    )
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)

    # prepare test dataset
    test_data = EgoSAGESupervisedDataLoader(
        graph, graphscope.learning.Mask.TEST,
        node_type=node_type, edge_type=edge_type, nbrs_num=nbrs_num, hops_num=hops_num,
    )
    test_embedding = model.forward(test_data.src_ego)
    test_labels = test_data.src_ego.src.labels
    test_indices = tf.math.argmax(test_embedding, 1, output_type=tf.int32)
    test_acc = tf.div(
        tf.reduce_sum(tf.cast(tf.math.equal(test_indices, test_labels), tf.float32)),
        tf.cast(tf.shape(test_labels)[0], tf.float32),
    )

    # train and test
    trainer = LocalTrainer()
    trainer.train(train_data.iterator, loss, optimizer, epochs=epochs)
    trainer.test(test_data.iterator, test_acc)

train_sage(lg, node_type="paper", edge_type="cites",
          class_num=349,  # output dimension
          features_num=128,  # input dimension
)