创建 GraphFrames
用户可以从顶点和边数据框创建GraphFrames。
- 顶点数据框: 顶点数据框应包含一个名为 "id" 的特殊列,用于指定图中每个顶点的唯一标识符。
- 边数据框: 边数据框应包含两个特殊列:"src"(边的源顶点ID)和"dst"(边的目标顶点ID)。
两个DataFrame都可以包含任意其他列。这些列可以表示顶点和边属性。
GraphFrame 也可以从包含边信息的单个 DataFrame 构造。顶点将从边的源节点和目标节点推断得出。
以下示例演示了如何从顶点和边数据框创建 GraphFrame。
Python API
# Vertex DataFrame
v = spark.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60)
], ["id", "name", "age"])
# Edge DataFrame
e = spark.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "friend"),
("d", "a", "friend"),
("a", "e", "friend")
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)
上面构建的 GraphFrame 在 GraphFrames 包中可用(在 Spark-Connect 模式下不可用):
from graphframes.examples import Graphs
g = Graphs(spark).friends() # Get example graph
Scala API
import org.graphframes.GraphFrame
// Vertex DataFrame
val v = spark.createDataFrame(List(
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60)
)).toDF("id", "name", "age")
// Edge DataFrame
val e = spark.createDataFrame(List(
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "friend"),
("d", "a", "friend"),
("a", "e", "friend")
)).toDF("src", "dst", "relationship")
// Create a GraphFrame
val g = GraphFrame(v, e)
上面构建的 GraphFrame 在 GraphFrames 包中可用:
import org.graphframes.{examples, GraphFrame}
val g: GraphFrame = examples.Graphs.friends