基础图操作
基础
GraphFrames 提供了几种简单的图查询功能,例如节点度数。此外,由于 GraphFrames 将图表示为顶点和边数据框的组合,因此直接在顶点和边数据框上进行强大的查询变得十分便捷。这些数据框在 GraphFrame 中作为 vertices 和 edges 字段提供。
Python API
from graphframes.examples import Graphs
g = Graphs(spark).friends() # Get example graph
# Display the vertex DataFrame
g.vertices.show()
# +--+-------+---+
# |id| name|age|
# +--+-------+---+
# | a| Alice| 34|
# | b| Bob| 36|
# | c|Charlie| 30|
# | d| David| 29|
# | e| Esther| 32|
# | f| Fanny| 36|
# | g| Gabby| 60|
# +--+-------+---+
# Display the edge DataFrame
g.edges.show()
# +---+---+------------+
# |src|dst|relationship|
# +---+---+------------+
# | a| b| friend|
# | b| c| follow|
# | c| b| follow|
# | f| c| follow|
# | e| f| follow|
# | e| d| friend|
# | d| a| friend|
# | a| e| friend|
# +---+---+------------+
# Get a DataFrame with columns "id" and "inDegree" (in-degree)
vertexInDegrees = g.inDegrees
# Find the youngest user's age in the graph
# This queries the vertex DataFrame
g.vertices.groupBy().min("age").show()
# Count the number of "follows" in the graph
# This queries the edge DataFrame
numFollows = g.edges.filter("relationship = 'follow'").count()
Scala API
import org.graphframes.{examples,GraphFrame}
val g: GraphFrame = examples.Graphs.friends // get example graph
// Display the vertex and edge DataFrames
g.vertices.show()
// +--+-------+---+
// |id| name|age|
// +--+-------+---+
// | a| Alice| 34|
// | b| Bob| 36|
// | c|Charlie| 30|
// | d| David| 29|
// | e| Esther| 32|
// | f| Fanny| 36|
// | g| Gabby| 60|
// +--+-------+---+
g.edges.show()
// +---+---+------------+
// |src|dst|relationship|
// +---+---+------------+
// | a| b| friend|
// | b| c| follow|
// | c| b| follow|
// | f| c| follow|
// | e| f| follow|
// | e| d| friend|
// | d| a| friend|
// | a| e| friend|
// +---+---+------------+
// import Spark SQL package
import org.apache.spark.sql.DataFrame
// Get a DataFrame with columns "id" and "inDeg" (in-degree)
val vertexInDegrees: DataFrame = g.inDegrees
vertexInDegrees.show()
// Find the youngest user's age in the graph.
// This queries the vertex DataFrame.
g.vertices.groupBy().min("age").show()
// Count the number of "follows" in the graph.
// This queries the edge DataFrame.
val numFollows = g.edges.filter("relationship = 'follow'").count()
过滤边或顶点
GraphFrames 提供了一种基于边和顶点属性进行筛选的 API。
注意: 此API用于简单筛选。对于更复杂的应用场景,建议使用PropertyGraphFrame模型。PropertyGraphFrame处理整个图的逻辑架构,并提供更强大的API,用于根据所需属性和筛选器选择任何子图。
Python API
from pyspark.sql import functions as F
from graphframes.examples import Graphs
g = Graphs(spark).friends() # Get example graph
g.filterVertices(F.col("name") == F.lit("Alice"))
g.filterEdges(F.col("relationship") == F.lit("follow"))
Scala API
import org.apache.spark.sql.functions._
import org.graphframes.{examples,GraphFrame}
val g: GraphFrame = examples.Graphs.friends
g.filterVertices(col("name") === lit("Alice"))
g.filterEdges(col("relationship") === lit("follow"))