安装
如果您是 Apache Spark 的新用户,请参阅 Apache Spark 文档及其快速入门指南获取更多信息。
Spark 版本兼容性
| 组件 | Spark 3.x (Scala 2.12) | Spark 3.x (Scala 2.13) | Spark 4.x (Scala 2.13) |
|---|---|---|---|
| graphframes | ✓ | ✓ | ✓ |
| graphframes-connect | ✓ | ✓ | ✓ |
以下示例展示了如何使用 GraphFrames 包运行 Spark shell。我们使用 --packages 参数自动下载 graphframes 包及其所有依赖项。
Spark 3.x
Spark Shell
$ ./bin/spark-shell --packages io.graphframes:graphframes-spark3_2.12:0.10.0
或者使用以下命令强制使用 Scala 2.13:
$ ./bin/spark-shell --packages io.graphframes:graphframes-spark3_2.13:0.10.0
PySpark
$ pip install graphframes-py==0.10.0
$ ./bin/pyspark --packages io.graphframes:graphframes-spark3_2.12:0.10.0
Spark 4.x
Spark Shell
$ ./bin/spark-shell --packages io.graphframes:graphframes-spark4_2.13:0.10.0
PySpark
$ pip install graphframes-py==0.10.0
$ ./bin/pyspark --packages io.graphframes:graphframes-spark4_2.13:0.10.0
Spark Connect 服务器扩展
要将 GraphFrames 添加到您的 Spark Connect 服务器,您需要指定插件名称:
适用于 Spark 4.x:
./sbin/start-connect-server.sh \
--conf spark.connect.extensions.relation.classes=\
org.apache.spark.sql.graphframes.GraphFramesConnect \
--packages io.graphframes.graphframes-connect-spark4_2.13:0.10.0
适用于 Spark 3.x:
./sbin/start-connect-server.sh \
--conf spark.connect.extensions.relation.classes=\
org.apache.spark.sql.graphframes.GraphFramesConnect \
--packages io.graphframes.graphframes-connect-spark3_2.12:0.10.0
警告:GraphFrames Connect Server Extension 与 Databricks 托管的 SparkConnect 不兼容。要使其正常工作,您需要使用特定标志从源代码构建 GraphFrames Connect Server Extension:
./build/sbt -Dvendor.name=dbx connect/assembly
Spark Connect 客户端
目前 GraphFrames 仅随包捆绑了 PySpark 客户端:pip install graphframes-py==0.10.0。在运行时环境中,如果是 Spark Connect 环境,GraphFrames PySpark 客户端将自动处理与 GraphFrames Connect 服务器扩展的连接。
消息
目前,已开放以下API接口:
message GraphFramesAPI {
bytes vertices = 1;
bytes edges = 2;
oneof method {
AggregateMessages aggregate_messages = 3;
BFS bfs = 4;
ConnectedComponents connected_components = 5;
DropIsolatedVertices drop_isolated_vertices = 6;
FilterEdges filter_edges = 7;
FilterVertices filter_vertices = 8;
Find find = 9;
LabelPropagation label_propagation = 10;
PageRank page_rank = 11;
ParallelPersonalizedPageRank parallel_personalized_page_rank = 12;
PowerIterationClustering power_iteration_clustering = 13;
Pregel pregel = 14;
ShortestPaths shortest_paths = 15;
StronglyConnectedComponents strongly_connected_components = 16;
SVDPlusPlus svd_plus_plus = 17;
TriangleCount triangle_count = 18;
Triplets triplets = 19;
}
}
从源码构建 GraphFrames
./build/sbt package
夜间构建版本
GraphFrames 项目正在将快照版本(夜间构建)发布到“中央门户快照库”。请阅读 Sonatype 文档中的此章节以了解如何在您的项目中使用快照版本。
组标识:io.graphframes
构件标识:
graphframes-spark3_2.12graphframes-spark3_2.13graphframes-connect-spark3_2.12graphframes-connect-spark3_2.13graphframes-graphx-spark3_2.12graphframes-graphx-spark3_2.13graphframes-spark4_2.13graphframes-connect-spark4_2.13graphframes-graphx-spark4_2.13