基础图操作

基础

GraphFrames 提供了几种简单的图查询功能,例如节点度数。此外,由于 GraphFrames 将图表示为顶点和边数据框的组合,因此直接在顶点和边数据框上进行强大的查询变得十分便捷。这些数据框在 GraphFrame 中作为 verticesedges 字段提供。

Python API

from graphframes.examples import Graphs

g = Graphs(spark).friends()  # Get example graph

# Display the vertex DataFrame
g.vertices.show()

# +--+-------+---+
# |id|   name|age|
# +--+-------+---+
# | a|  Alice| 34|
# | b|    Bob| 36|
# | c|Charlie| 30|
# | d|  David| 29|
# | e| Esther| 32|
# | f|  Fanny| 36|
# | g|  Gabby| 60|
# +--+-------+---+

# Display the edge DataFrame
g.edges.show()

# +---+---+------------+
# |src|dst|relationship|
# +---+---+------------+
# |  a|  b|      friend|
# |  b|  c|      follow|
# |  c|  b|      follow|
# |  f|  c|      follow|
# |  e|  f|      follow|
# |  e|  d|      friend|
# |  d|  a|      friend|
# |  a|  e|      friend|
# +---+---+------------+

# Get a DataFrame with columns "id" and "inDegree" (in-degree)
vertexInDegrees = g.inDegrees

# Find the youngest user's age in the graph
# This queries the vertex DataFrame
g.vertices.groupBy().min("age").show()

# Count the number of "follows" in the graph
# This queries the edge DataFrame
numFollows = g.edges.filter("relationship = 'follow'").count()

Scala API

import org.graphframes.{examples,GraphFrame}

val g: GraphFrame = examples.Graphs.friends  // get example graph

// Display the vertex and edge DataFrames
g.vertices.show()
// +--+-------+---+
// |id|   name|age|
// +--+-------+---+
// | a|  Alice| 34|
// | b|    Bob| 36|
// | c|Charlie| 30|
// | d|  David| 29|
// | e| Esther| 32|
// | f|  Fanny| 36|
// | g|  Gabby| 60|
// +--+-------+---+

g.edges.show()
// +---+---+------------+
// |src|dst|relationship|
// +---+---+------------+
// |  a|  b|      friend|
// |  b|  c|      follow|
// |  c|  b|      follow|
// |  f|  c|      follow|
// |  e|  f|      follow|
// |  e|  d|      friend|
// |  d|  a|      friend|
// |  a|  e|      friend|
// +---+---+------------+

// import Spark SQL package
import org.apache.spark.sql.DataFrame

// Get a DataFrame with columns "id" and "inDeg" (in-degree)
val vertexInDegrees: DataFrame = g.inDegrees
vertexInDegrees.show()

// Find the youngest user's age in the graph.
// This queries the vertex DataFrame.
g.vertices.groupBy().min("age").show()

// Count the number of "follows" in the graph.
// This queries the edge DataFrame.
val numFollows = g.edges.filter("relationship = 'follow'").count()

过滤边或顶点

GraphFrames 提供了一种基于边和顶点属性进行筛选的 API。

注意: 此API用于简单筛选。对于更复杂的应用场景,建议使用PropertyGraphFrame模型PropertyGraphFrame处理整个图的逻辑架构,并提供更强大的API,用于根据所需属性和筛选器选择任何子图。

Python API

from pyspark.sql import functions as F
from graphframes.examples import Graphs

g = Graphs(spark).friends()  # Get example graph
g.filterVertices(F.col("name") == F.lit("Alice"))
g.filterEdges(F.col("relationship") == F.lit("follow"))

Scala API

import org.apache.spark.sql.functions._
import org.graphframes.{examples,GraphFrame}

val g: GraphFrame = examples.Graphs.friends
g.filterVertices(col("name") === lit("Alice"))
g.filterEdges(col("relationship") === lit("follow"))