cudf.core.column.string.StringMethods.count#

StringMethods.count(pat: str, flags: int = 0) → SeriesOrIndex[source]#

计算Series/Index中每个字符串中模式的出现次数。

此函数用于计算Series中每个字符串元素中特定正则表达式模式重复的次数。

Parameters:

patstr or compiled regex: 有效的正则表达式。
flagsint, default 0 (no flags): 传递给正则表达式引擎的标志（例如 re.MULTILINE）

Returns:

Series or Index

示例

>>> import cudf
>>> s = cudf.Series(['A', 'B', 'Aaba', 'Baca', None, 'CABA', 'cat'])
>>> s.str.count('a')
0       0
1       0
2       2
3       2
4    <NA>
5       0
6       1
dtype: int32

转义 '$' 以查找字面上的美元符号。

>>> s = cudf.Series(['$', 'B', 'Aab$', '$$ca', 'C$B$', 'cat'])
>>> s.str.count('\$')
0    1
1    0
2    1
3    2
4    2
5    0
dtype: int32

这在索引上也可用。

>>> index = cudf.Index(['A', 'A', 'Aaba', 'cat'])
>>> index.str.count('a')
Index([0, 0, 2, 1], dtype='int64')

Pandas 兼容性说明

pandas.Series.str.count()

flags 参数目前仅支持 re.DOTALL 和 re.MULTILINE。
在传递 pat 时，某些字符需要进行转义。例如，'$' 在正则表达式中具有特殊含义，因此在查找此字面字符时必须进行转义。