支持的pandas API

下表显示了在Spark上实现或未实现的pandas API。有些pandas API未实现完整的参数,因此第三列显示了每个API缺失的参数。

  • 第二列中的‘Y’表示它已实现,包括其所有参数。

  • ‘N’表示尚未实现。

  • ‘P’表示它已经部分实现,缺少一些参数。

下面列表中的所有API都通过分布式执行来计算数据,除了设计上需要本地执行的那些。例如, DataFrame.to_numpy() 需要将数据收集到驱动程序端。

如果您想要未实现的 pandas API 或参数,您可以创建一个 Apache Spark JIRA 来请求或自行贡献。

API列表是基于 最新的pandas官方API参考 进行更新的。

分类索引 API

API

已实现

缺失参数

add_categories()

Y

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

as_ordered()

Y

as_unordered()

Y

asof()

Y

asof_locs

N

astype()

P

copy

copy()

Y

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

格式

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

groupby

N

holds_integer()

Y

identical()

Y

infer_objects

N

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_numeric()

Y

is_object()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

连接

N

map()

Y

max()

Y

内存使用

N

min()

Y

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

展平

N

重新索引

N

remove_categories()

Y

remove_unused_categories()

Y

rename()

Y

rename_categories()

Y

reorder_categories()

Y

repeat()

P

axis

searchsorted

N

set_categories()

Y

set_names()

Y

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_list()

Y

to_numpy()

P

na_value

to_series()

P

index

tolist()

Y

transpose()

Y

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

数据框 API

```html

API

已实现

缺少参数

abs()

Y

add()

P

axis , fill_value , level

add_prefix()

P

axis

add_suffix()

P

axis

agg()

P

axis

aggregate()

P

axis

align()

P

broadcast_axis , fill_axis , fill_value , level , limit 等等。详细信息请查看 pandas.DataFrame.align pyspark.pandas.DataFrame.align

all()

Y

any()

P

skipna

apply()

P

raw , result_type

applymap()

P

na_action

asfreq

N

asof

N

assign()

Y

astype()

P

copy , errors

at_time()

Y

backfill()

P

downcast

between_time()

P

inclusive

bfill()

P

downcast

bool()

Y

boxplot()

P

ax , backend , by , column , figsize 等等。详细信息请查看 pandas.DataFrame.boxplot pyspark.pandas.DataFrame.boxplot

clip()

P

axis , inplace

combine

N

combine_first()

Y

compare

N

convert_dtypes

N

copy()

Y

corr()

P

numeric_only

corrwith()

P

numeric_only

count()

Y

cov()

P

numeric_only

cummax()

P

axis

cummin()

P

axis

cumprod()

P

axis

cumsum()

P

axis

describe()

P

exclude , include

diff()

Y

div()

P

axis , fill_value , level

divide()

P

axis , fill_value , level

dot()

Y

drop()

P

errors , inplace , level

drop_duplicates()

Y

droplevel()

Y

dropna()

P

ignore_index

duplicated()

Y

eq()

P

axis , level

equals()

Y

eval()

Y

ewm()

P

adjust , axis , method , times

expanding()

P

axis , method

explode()

Y

ffill()

P

downcast

fillna()

P

downcast

filter()

Y

first()

Y

first_valid_index()

Y

floordiv()

P

axis , fill_value , level

ge()

P

axis , level

get()

Y

groupby()

P

group_keys , level , observed , sort

gt()

P

axis , level

head()

Y

hist()

P

ax , backend , by , column , data 等等。详细信息请查看 pandas.DataFrame.hist pyspark.pandas.DataFrame.hist

idxmax()

P

numeric_only , skipna

idxmin()

P

numeric_only , skipna

infer_objects

N

info()

P

memory_usage , show_counts

insert()

Y

interpolate()

P

axis , downcast , inplace

isetitem

N

isin()

Y

isna()

Y

isnull()

Y

items()

Y

iterrows()

Y

itertuples()

Y

join()

P

other , sort , validate

keys()

Y

kurt()

Y

kurtosis()

Y

last()

Y

last_valid_index()

Y

le()

P

axis , level

lt()

P

axis , level

mask()

P

axis , inplace , level

max()

Y

mean()

Y

median()

Y

melt()

P

col_level , ignore_index

memory_usage

N

merge()

P

copy , indicator , sort , validate

min()

Y

mod()

P

axis , fill_value , level

mode()

Y

mul()

P

axis , fill_value , level

multiply()

P

axis , fill_value , level

ne()

P

axis , level

nlargest()

Y

notna()

Y

notnull()

Y

nsmallest()

Y

nunique()

Y

pad()

P

downcast

pct_change()

P

fill_method , freq , limit

pipe()

Y

pivot()

Y

pivot_table()

P

dropna , margins , margins_name , observed , sort

pop()

Y

pow()

P

axis , fill_value , level

prod()

Y

product()

Y

quantile()

P

interpolation , method

query()

Y

radd()

P

axis , fill_value , level

rank()

P

axis , na_option , pct

rdiv()

P

axis , fill_value , level

reindex()

P

level , limit , method , tolerance

reindex_like()

P

limit , method , tolerance

rename()

P

copy

rename_axis()

P

copy

reorder_levels

N

replace()

Y

resample()

P

axis , convention , group_keys , kind , level 等等。详细信息请查看 pandas.DataFrame.resample pyspark.pandas.DataFrame.resample

reset_index() </pNone

日期时间索引 API

API

已实现

缺失参数

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

as_unit

N

asof()

Y

asof_locs

N

astype()

P

copy

ceil()

Y

copy()

Y

day_name()

Y

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

floor()

Y

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

groupby

N

holds_integer()

Y

identical()

Y

indexer_at_time()

Y

indexer_between_time()

Y

infer_objects

N

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_numeric()

Y

is_object()

Y

isin()

P

level

isna()

Y

isnull()

Y

isocalendar

N

item()

Y

join

N

map()

Y

max()

P

axis , skipna

mean

N

memory_usage

N

min()

P

axis , skipna

month_name()

Y

normalize()

Y

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

rename()

Y

repeat()

P

axis

round()

Y

searchsorted

N

set_names()

Y

shift()

P

freq

slice_indexer

N

slice_locs

N

snap

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

std

N

strftime()

Y

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_julian_date

N

to_list()

Y

to_numpy()

P

na_value

to_period

N

to_pydatetime

N

to_series()

P

index

tolist()

Y

transpose()

Y

tz_convert

N

tz_localize

N

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

索引 API

API

已实现

缺失参数

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

copy()

Y

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

groupby

N

holds_integer()

Y

identical()

Y

infer_objects

N

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_numeric()

Y

is_object()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

P

axis , skipna

memory_usage

N

min()

P

axis , skipna

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

rename()

Y

repeat()

P

axis

searchsorted

N

set_names()

Y

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_list()

Y

to_numpy()

P

na_value

to_series()

P

index

tolist() </

多重索引 API

API

已实现

缺少的参数

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

copy()

P

name , names

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equal_levels()

Y

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_loc_level

N

get_locs

N

get_slice_bound

N

groupby

N

holds_integer()

Y

identical()

Y

infer_objects

N

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_numeric()

Y

is_object()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

P

axis , skipna

memory_usage

N

min()

P

axis , skipna

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

remove_unused_levels

N

rename()

P

level , names

reorder_levels

N

repeat()

P

axis

searchsorted

N

set_codes

N

set_levels

N

set_names()

Y

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

swaplevel()

Y

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

P

allow_duplicates

to_list()

Y

to_numpy()

P

na_value

to_series()

P

index

tolist()

Y

transpose()

Y

truncate

N

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

系列 API

API

已实现

缺失参数

abs()

Y

add()

P

axis , level

add_prefix()

P

axis

add_suffix()

P

axis

agg()

P

axis

aggregate()

P

axis

align()

P

broadcast_axis , fill_axis , fill_value , level , limit and more. See the pandas.Series.align and pyspark.pandas.Series.align for detail.

all()

P

bool_only

any()

P

bool_only , skipna

apply()

P

convert_dtype

argmax()

Y

argmin()

Y

argsort()

P

axis , kind , order

asfreq

N

asof()

P

subset

astype()

P

copy , errors

at_time()

Y

autocorr()

Y

backfill()

P

downcast

between()

Y

between_time()

P

inclusive

bfill()

P

downcast

bool()

Y

clip()

P

axis

combine

N

combine_first()

Y

compare()

P

align_axis , result_names

convert_dtypes

N

copy()

Y

corr()

Y

count()

Y

cov()

Y

cummax()

P

axis

cummin()

P

axis

cumprod()

P

axis

cumsum()

P

axis

describe()

P

exclude , include

diff()

Y

div()

P

axis , fill_value , level

divide()

P

axis , fill_value , level

divmod()

P

axis , fill_value , level

dot()

Y

drop()

P

axis , errors

drop_duplicates()

P

ignore_index

<code class="xref py py-func docutils

时间差索引 API

API

实现

缺失的参数

all()

any()

append()

argmax()

部分实现

axis , skipna

argmin()

部分实现

axis , skipna

argsort

未实现

as_unit

未实现

asof()

asof_locs

未实现

astype()

部分实现

copy

ceil

未实现

copy()

delete()

difference()

drop()

部分实现

errors

drop_duplicates()

droplevel()

dropna()

duplicated

未实现

equals()

factorize()

部分实现

use_na_sentinel

fillna()

部分实现

downcast

floor

未实现

format

未实现

get_indexer

未实现

get_indexer_for

未实现

get_indexer_non_unique

未实现

get_level_values()

get_loc

未实现

get_slice_bound

未实现

groupby

未实现

holds_integer()

identical()

infer_objects

未实现

insert()

intersection()

部分实现

sort

is_

未实现

is_boolean()

is_categorical()

is_floating()

is_integer()

is_interval()

is_numeric()

is_object()

isin()

部分实现

level

isna()

isnull()

item()

join

未实现

map()

max()

部分实现

axis , skipna

mean

未实现

median

未实现

memory_usage

未实现

min()

部分实现</

通用函数API

API

已实现

缺失的参数

array

N

bdate_range

N

concat()

P

copy , keys , levels , names , verify_integrity

crosstab

N

cut

N

date_range()

P

inclusive , unit

eval

N

factorize

N

from_dummies

N

get_dummies()

Y

infer_freq

N

interval_range

N

isna()

Y

isnull()

Y

json_normalize

N

lreshape

N

melt()

P

col_level , ignore_index

<a class="reference internal"

扩展 API

API

已实现

缺少参数

agg

N

aggregate

N

apply

N

corr

N

count()

P

<span class="pre

扩展分组API

API

实现情况

缺失参数

agg

N

aggregate

N

apply

N

corr

N

count()

P

numeric_only

cov

N

kurt()

P

numeric_only

max()

P

engine , engine_kwargs , numeric_only

mean()

P

engine , engine_kwargs , numeric_only

median

N

min()

P

engine , engine_kwargs , numeric_only

quantile()

P

interpolation , numeric_only

rank

N

sem

N

skew()

P

numeric_only

std()

P

ddof , engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs , numeric_only

var()

P

ddof , engine , engine_kwargs , numeric_only

滚动 API

API

实现

缺少的参数

agg

N

aggregate

N

apply

N

corr

N

count()

P

numeric_only

cov

N

kurt()

P

numeric_only

max()

P

engine , engine_kwargs , numeric_only

mean()

P

engine , engine_kwargs , numeric_only

median

N

min()

P

engine , engine_kwargs , numeric_only

quantile()

P

interpolation , numeric_only

rank

N

sem

N

skew()

P

numeric_only

std()

P

ddof , engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs , numeric_only

var()

P

ddof , engine , engine_kwargs , numeric_only

滚动分组 API

API

实现状态

缺失参数

agg

N

aggregate

N

apply

N

corr

N

count()

P

numeric_only

cov

N

kurt()

P

numeric_only

max()

P

engine , engine_kwargs , numeric_only

mean()

P

engine , engine_kwargs , numeric_only

median

N

min()

P

engine , engine_kwargs , numeric_only

quantile()

P

interpolation , numeric_only

rank

N

sem

N

skew()

P

numeric_only

std()

P

ddof , engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs , numeric_only

var()

P

ddof , engine , engine_kwargs , numeric_only

窗口 API

API

已实现

缺失参数

agg

N

aggregate

N

mean

N

std

N

sum

N

var

N

数据框分组 API

API

已实现

缺少参数

agg()

P

engine , engine_kwargs , func

aggregate()

P

engine , engine_kwargs , func

all()

Y

any()

P

skipna

apply()

Y

bfill()

Y

箱形图

N

相关性

N

与...的相关性

N

count()

Y

协方差

N

cumcount()

Y

cummax()

P

axis , numeric_only

cummin()

P

axis , numeric_only

cumprod()

P

axis

cumsum()

P

axis

describe()

P

exclude , include , percentiles

diff()

P

axis

ewm()

Y

expanding()

Y

ffill()

Y

fillna()

P

downcast

filter()

P

dropna

first()

Y

get_group()

P

obj

head()

Y

直方图

N

idxmax()

P

axis , numeric_only

idxmin()

P

axis , numeric_only

last()

Y

max()

P

engine , engine_kwargs

mean()

P

engine , engine_kwargs

median()

Y

min()

P

engine , engine_kwargs

组编号

N

nunique()

Y

ohlc

N

百分比变化

N

管道

N

prod()

Y

quantile()

P

interpolation , numeric_only

rank()

P

axis , na_option , pct

重采样

N

rolling()

Y

采样

N

sem()

P

numeric_only

shift()

P

axis , freq

size()

Y

skew()

P

axis , numeric_only , skipna

std()

P

engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs

tail()

Y

取值

N

transform()

P

engine , engine_kwargs

值计数

N

var()

P

engine , engine_kwargs , numeric_only

分组 API

API

已实现

缺失的参数

agg()

P

func

aggregate()

P

func

all()

Y

any()

P

skipna

apply()

Y

bfill()

Y

count()

Y

cumcount()

Y

cummax()

P

axis , numeric_only

cummin()

P

axis , numeric_only

cumprod()

P

axis

cumsum()

P

axis

描述

N

diff()

P

axis

ewm()

Y

expanding()

Y

ffill()

Y

first()

Y

get_group()

P

obj

head()

Y

last()

Y

max()

P

engine , engine_kwargs

mean()

P

engine , engine_kwargs

median()

Y

min()

P

engine , engine_kwargs

ngroup

N

ohlc

N

pct_change

N

pipe

N

prod()

Y

quantile()

P

interpolation , numeric_only

rank()

P

axis , na_option , pct

重采样

N

rolling()

Y

抽样

N

sem()

P

numeric_only

shift()

P

axis , freq

size()

Y

std()

P

engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs

tail()

Y

var()

P

engine , engine_kwargs , numeric_only

系列分组 API

API

已实现

缺失参数

agg()

P

engine , engine_kwargs , func

aggregate()

P

engine , engine_kwargs , func

all()

Y

any()

P

skipna

apply()

Y

bfill()

Y

corr

N

count()

Y

cov

N

cumcount()

Y

cummax()

P

axis , numeric_only

cummin()

P

axis , numeric_only

cumprod()

P

axis

cumsum()

P

axis

describe

N

diff()

P

axis

ewm()

Y

expanding()

Y

ffill()

Y

fillna()

P

downcast

filter()

P

dropna

first()

Y

get_group()

P

obj

head()

Y

hist

N

idxmax()

P

axis

idxmin()

P

axis

last()

Y

max()

P

engine , engine_kwargs

mean()

P

engine , engine_kwargs

median()

Y

min()

P

engine , engine_kwargs

ngroup

N

nlargest()

P

keep

nsmallest()

P

keep

nunique()

Y

ohlc

N

pct_change

N

pipe

N

prod()

Y

quantile()

P

interpolation , numeric_only

rank()

P

axis , na_option , pct

resample

N

rolling()

Y

sample

N

sem()

P

numeric_only

shift()

P

axis , freq

size()

Y

skew()

P

axis , numeric_only , skipna

std()

P

engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs

tail()

Y

take

N

transform()

P

engine , engine_kwargs

unique()

Y

value_counts()

P

bins , normalize

var()

P

engine , engine_kwargs , numeric_only