创建 GraphFrames

用户可以从顶点和边数据框创建GraphFrames。

两个DataFrame都可以包含任意其他列。这些列可以表示顶点和边属性。

GraphFrame 也可以从包含边信息的单个 DataFrame 构造。顶点将从边的源节点和目标节点推断得出。

以下示例演示了如何从顶点和边数据框创建 GraphFrame。

Python API

# Vertex DataFrame
v = spark.createDataFrame([
    ("a", "Alice", 34),
    ("b", "Bob", 36),
    ("c", "Charlie", 30),
    ("d", "David", 29),
    ("e", "Esther", 32),
    ("f", "Fanny", 36),
    ("g", "Gabby", 60)
], ["id", "name", "age"])

# Edge DataFrame
e = spark.createDataFrame([
    ("a", "b", "friend"),
    ("b", "c", "follow"),
    ("c", "b", "follow"),
    ("f", "c", "follow"),
    ("e", "f", "follow"),
    ("e", "d", "friend"),
    ("d", "a", "friend"),
    ("a", "e", "friend")
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)

上面构建的 GraphFrame 在 GraphFrames 包中可用(在 Spark-Connect 模式下不可用):

from graphframes.examples import Graphs

g = Graphs(spark).friends()  # Get example graph

Scala API

import org.graphframes.GraphFrame

// Vertex DataFrame
val v = spark.createDataFrame(List(
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)
)).toDF("id", "name", "age")

// Edge DataFrame
val e = spark.createDataFrame(List(
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
  ("f", "c", "follow"),
  ("e", "f", "follow"),
  ("e", "d", "friend"),
  ("d", "a", "friend"),
  ("a", "e", "friend")
)).toDF("src", "dst", "relationship")
// Create a GraphFrame
val g = GraphFrame(v, e)

上面构建的 GraphFrame 在 GraphFrames 包中可用:

import org.graphframes.{examples, GraphFrame}

val g: GraphFrame = examples.Graphs.friends