pyspark.SparkContext.union ¶
-
SparkContext.
union
( rdds : List [ pyspark.rdd.RDD [ T ] ] ) → pyspark.rdd.RDD [ T ] [source] ¶ -
构建一个RDD列表的并集。
这支持不同序列化格式的RDD的联合(unions()),尽管这会强制它们使用默认序列化器重新序列化:
新增于版本 0.7.0。
另请参阅
示例
>>> import os >>> import tempfile >>> with tempfile.TemporaryDirectory() as d: ... # generate a text RDD ... with open(os.path.join(d, "union-text.txt"), "w") as f: ... _ = f.write("Hello") ... text_rdd = sc.textFile(d) ... ... # generate another RDD ... parallelized = sc.parallelize(["World!"]) ... ... unioned = sorted(sc.union([text_rdd, parallelized]).collect())
>>> unioned ['Hello', 'World!']