mars.dataframe.Series.str.split#

Series.str.split(pat=None, n=-1, expand=False)#

根据给定的分隔符/定界符分割字符串。

从开头开始，将Series/Index中的字符串按指定的分隔符字符串拆分。等价于 str.split()。

Parameters

pat (str, 可选) – 用于分割的字符串或正则表达式。如果没有指定，则在空白处进行分割。
n (int, 默认值 -1 (所有)) – 限制输出中的分割数量。 None, 0 和 -1 将被解释为返回所有分割。
expand (bool, default False) –
将分割的字符串展开为单独的列。
- 如果 True，返回扩展维度的 DataFrame/MultiIndex。
- 如果 False，返回包含字符串列表的 Series/Index。

Returns

类型与调用者匹配，除非 expand=True （见注释）。

Return type

系列, 索引, 数据框或多重索引

另请参阅

Series.str.split: 根据给定的分隔符/定界符分割字符串。
Series.str.rsplit: 从右侧开始围绕给定的分隔符/定界符拆分字符串。
Series.str.join: 使用传递的分隔符连接作为元素包含在Series/Index中的列表。
str.split: 用于分割的标准库版本。
str.rsplit: rsplit的标准库版本。

备注

n关键字的处理取决于找到的划分数量：

如果找到分割数 > n，仅生成前 n 个分割
如果找到的分割小于等于 n，则执行所有分割
如果某一行找到的拆分数量 < n，则在 expand=True 的情况下，填充至 n 时附加 None

如果使用 expand=True，Series 和 Index 调用者将分别返回 DataFrame 和 MultiIndex 对象。

示例

>>> import numpy as np
>>> import mars.dataframe as md
>>> s = md.Series(["this is a regular sentence",
>>>                "https://docs.python.org/3/tutorial/index.html",
>>>                np.nan])
>>> s.execute()
0                       this is a regular sentence
1    https://docs.python.org/3/tutorial/index.html
2                                              NaN
dtype: object

在默认设置中，字符串通过空格分割。

>>> s.str.split().execute()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

没有n参数时，rsplit和split的输出是相同的。

>>> s.str.rsplit().execute()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

n 参数可以用来限制分隔符的分割数量。split 和 rsplit 的输出是不同的。

>>> s.str.split(n=2).execute()
0                     [this, is, a regular sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

>>> s.str.rsplit(n=2).execute()
0                     [this is a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

参数 pat 可以用于按其他字符进行分割。

>>> s.str.split(pat = "/").execute()
0                         [this is a regular sentence]
1    [https:, , docs.python.org, 3, tutorial, index...
2                                                  NaN
dtype: object

当使用 expand=True 时，分割的元素将扩展为单独的列。如果存在 NaN，它会在分割过程中在各列之间传播。

>>> s.str.split(expand=True).execute()
                                               0     1     2        3
0                                           this    is     a  regular
1  https://docs.python.org/3/tutorial/index.html  None  None     None
2                                            NaN   NaN   NaN      NaN \
             4
0     sentence
1         None
2          NaN

对于稍微复杂一些的用例，例如从网址中分离html文档名称，可以使用参数设置的组合。

>>> s.str.rsplit("/", n=1, expand=True).execute()
                                    0           1
0          this is a regular sentence        None
1  https://docs.python.org/3/tutorial  index.html
2                                 NaN         NaN

记得在显式使用正则表达式时转义特殊字符。

>>> s = pd.Series(["1+1=2"])
>>> s.str.split(r"\+|=", expand=True).execute()
     0    1    2
0    1    1    2