cudf.core.column.string.StringMethods.character_ngrams#

StringMethods.character_ngrams(n: int = 2, as_list: bool = False) → SeriesOrIndex[source]#

从字符串列中的字符生成n-grams。

Parameters:

nint: n-gram 的度数（连续字符的数量）。默认为 2，表示二元组。
as_listbool: 设置为True以返回列表列中的ngrams，其中每个列表元素是每个字符串的ngrams。

示例

>>> import cudf
>>> str_series = cudf.Series(['abcd','efgh','xyz'])
>>> str_series.str.character_ngrams(2)
0    ab
0    bc
0    cd
1    ef
1    fg
1    gh
2    xy
2    yz
dtype: object
>>> str_series.str.character_ngrams(3)
0    abc
0    bcd
1    efg
1    fgh
2    xyz
dtype: object
>>> str_series.str.character_ngrams(3,True)
0    [abc, bcd]
1    [efg, fgh]
2         [xyz]
dtype: list