mars.dataframe.Index.drop_duplicates#

Index.drop_duplicates(keep='first', method='auto')#

返回去除重复值的索引。

Parameters

keep ({‘first’, ‘last’, False}, default ‘first’) –

Returns

去重

Return type

索引

另请参阅

示例

生成一个带有重复值的 pandas.Index。

>>> import mars.dataframe as md

>>> idx = md.Index(['lame', 'cow', 'lame', 'beetle', 'lame', 'hippo'])

“keep” 参数控制哪些重复值被移除。值为“first”会保留每组重复条目的第一次出现。 keep 的默认值为“first”。

>>> idx.drop_duplicates(keep='first').execute()
Index(['lame', 'cow', 'beetle', 'hippo'], dtype='object')

值‘last’保留每组重复条目的最后一次出现。

>>> idx.drop_duplicates(keep='last').execute()
Index(['cow', 'beetle', 'lame', 'hippo'], dtype='object')

值 False 会丢弃所有重复条目的集合。

>>> idx.drop_duplicates(keep=False).execute()
Index(['cow', 'beetle', 'hippo'], dtype='object')