集合运算符
描述
集合运算符用于将两个输入关系组合成一个单一的关系。Spark SQL 支持三种类型的集合运算符:
-
EXCEPT
或MINUS
-
INTERSECT
-
UNION
请注意,输入关系必须具有相同数量的列,并且相应列的数据类型必须兼容。
排除
EXCEPT
和
EXCEPT ALL
返回在一个关系中找到但在另一个关系中未找到的行。
EXCEPT
(另外,
EXCEPT DISTINCT
)仅取唯一行,而
EXCEPT ALL
不会从结果行中删除重复项。请注意,
MINUS
是
EXCEPT
的别名。
语法
[ ( ] relation [ ) ] EXCEPT | MINUS [ ALL | DISTINCT ] [ ( ] relation [ ) ]
示例
-- 使用 number1 和 number2 表格在此页面演示集合运算符。
SELECT * FROM number1;
+---+
| c|
+---+
| 3|
| 1|
| 2|
| 2|
| 3|
| 4|
+---+
SELECT * FROM number2;
+---+
| c|
+---+
| 5|
| 1|
| 2|
| 2|
+---+
SELECT c FROM number1 EXCEPT SELECT c FROM number2;
+---+
| c|
+---+
| 3|
| 4|
+---+
SELECT c FROM number1 MINUS SELECT c FROM number2;
+---+
| c|
+---+
| 3|
| 4|
+---+
SELECT c FROM number1 EXCEPT ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 3|
| 4|
+---+
SELECT c FROM number1 MINUS ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 3|
| 4|
+---+
交集
INTERSECT
和
INTERSECT ALL
返回在两个关系中都找到的行。
INTERSECT
(或者
INTERSECT DISTINCT
)只取不同的行,而
INTERSECT ALL
则不从结果行中移除重复项。
语法
[ ( ] relation [ ) ] 交集 [ 所有 | 唯一 ] [ ( ] relation [ ) ]
示例
(SELECT c FROM number1) INTERSECT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
+---+
(SELECT c FROM number1) INTERSECT DISTINCT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
+---+
(SELECT c FROM number1) INTERSECT ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 2|
| 2|
+---+
联合
UNION
和
UNION ALL
返回在任一关系中找到的行。
UNION
(另外,
UNION DISTINCT
)仅取不同的行,而
UNION ALL
不会从结果行中删除重复项。
语法
[ ( ] relation [ ) ] UNION [ ALL | DISTINCT ] [ ( ] relation [ ) ]
示例
(SELECT c FROM number1) UNION (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 3|
| 5|
| 4|
| 2|
+---+
(SELECT c FROM number1) UNION DISTINCT (SELECT c FROM number2);
+---+
| c|
+---+
| 1|
| 3|
| 5|
| 4|
| 2|
+---+
SELECT c FROM number1 UNION ALL (SELECT c FROM number2);
+---+
| c|
+---+
| 3|
| 1|
| 2|
| 2|
| 3|
| 4|
| 5|
| 1|
| 2|
| 2|
+---+