pyspark.SparkContext.union

SparkContext. union ( rdds : List [ pyspark.rdd.RDD [ T ] ] ) → pyspark.rdd.RDD [ T ] [source]

构建一个RDD列表的并集。

这支持不同序列化格式的RDD的联合(unions()),尽管这会强制它们使用默认序列化器重新序列化:

新增于版本 0.7.0。

另请参阅

RDD.union()

示例

>>> import os
>>> import tempfile
>>> with tempfile.TemporaryDirectory() as d:
...     # generate a text RDD
...     with open(os.path.join(d, "union-text.txt"), "w") as f:
...         _ = f.write("Hello")
...     text_rdd = sc.textFile(d)
...
...     # generate another RDD
...     parallelized = sc.parallelize(["World!"])
...
...     unioned = sorted(sc.union([text_rdd, parallelized]).collect())
>>> unioned
['Hello', 'World!']