mars.dataframe.Series.str.contains#

Series.str.contains(pat, case=True, flags=0, na=None, regex=True)#

测试模式或正则表达式是否包含在系列或索引的字符串中。

返回布尔系列或索引，基于给定模式或正则表达式是否包含在系列或索引的字符串中。

Parameters

pat (str) – 字符序列或正则表达式。
case (bool, 默认值为 True) – 如果为 True，则区分大小写。
flags (int, 默认 0 (无标志)) – 传递给 re 模块的标志，例如 re.IGNORECASE。
na (标量, 可选) – 填充缺失值的值。默认值取决于数组的dtype。对于对象类型，使用numpy.nan。对于StringDtype，使用pandas.NA。
regex (bool, 默认值为 True) –
如果为 True，假定 pat 是一个正则表达式。

如果为 False，将 pat 视为字面字符串。

Returns

一个布尔值的系列或索引，指示给定模式是否包含在每个元素的字符串中。

Return type

Series 或 Index 的布尔值

另请参阅

match: 类似，但更严格，依赖于 re.match 而不是 re.search。
Series.str.startswith: 测试每个字符串元素的开头是否与模式匹配。
Series.str.endswith: 与startswith相同，但测试字符串的结尾。

示例

仅使用字面模式返回布尔值序列。

>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> s1 = md.Series(['Mouse', 'dog', 'house and parrot', '23', mt.NaN])
>>> s1.str.contains('og', regex=False).execute()
0    False
1     True
2    False
3    False
4      NaN
dtype: object

仅使用字面模式返回布尔值索引。

>>> ind = md.Index(['Mouse', 'dog', 'house and parrot', '23.0', mt.NaN])
>>> ind.str.contains('23', regex=False).execute()
Index([False, False, False, True, nan], dtype='object')

使用 case 指定大小写敏感性。

>>> s1.str.contains('oG', case=True, regex=True).execute()
  False
  False
  False
  False
    NaN
dtype: object

将na指定为False而不是NaN会用False替换NaN值。如果Series或Index不包含NaN值，结果的数据类型将为bool，否则将为object数据类型。

>>> s1.str.contains('og', na=False, regex=True).execute()
  False
   True
  False
  False
  False
dtype: bool

当字符串中出现任一表达式时返回‘house’或‘dog’。

>>> s1.str.contains('house|dog', regex=True).execute()
  False
   True
   True
  False
    NaN
dtype: object

使用 flags 忽略正则表达式的大小写敏感性。

>>> import re
>>> s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True).execute()
0    False
1    False
2     True
3    False
4      NaN
dtype: object

使用正则表达式返回任何数字。

>>> s1.str.contains('\\d', regex=True).execute()
  False
  False
  False
   True
    NaN
dtype: object

确保 pat 不是一个字面模式，当 regex 设置为 True 时。注意在下面的示例中，可能只期望 s2[1] 和 s2[3] 返回 True。然而，‘.0’ 作为正则表达式匹配任何字符后跟一个 0。

>>> s2 = md.Series(['40', '40.0', '41', '41.0', '35'])
>>> s2.str.contains('.0', regex=True).execute()
0     True
1     True
2    False
3     True
4    False
dtype: bool