pyspark.pandas.DataFrame.corrwith ¶

DataFrame. corrwith ( other : Union [ DataFrame , Series ] , axis : Union [ int , str ] = 0 , drop : bool = False , method : str = 'pearson' ) → Series [source] ¶

计算成对相关性。

成对相关性是在DataFrame的行或列与Series或DataFrame的行或列之间计算的。在计算相关性之前，DataFrame会先沿着两个轴对齐。

新增于版本 3.4.0。

Parameters

other DataFrame, Series

用于计算相关性的对象。

axis int, default 0 or ‘index’

现在只能设置为0。

drop bool, default False

从结果中删除缺失的索引。

method {‘pearson’, ‘spearman’, ‘kendall’}

pearson : 标准相关系数
spearman : 斯皮尔曼等级相关
kendall : 肯德尔Tau相关系数

Returns

Series: 成对相关性。

另请参阅

DataFrame.corr: 计算列之间的成对相关性。

示例

           >>> df1 = ps.DataFrame({
...         "A":[1, 5, 7, 8],
...         "X":[5, 8, 4, 3],
...         "C":[10, 4, 9, 3]})
>>> df1.corrwith(df1[["X", "C"]]).sort_index()
A    NaN
C    1.0
X    1.0
dtype: float64

          

           >>> df2 = ps.DataFrame({
...         "A":[5, 3, 6, 4],
...         "B":[11, 2, 4, 3],
...         "C":[4, 3, 8, 5]})

          

           >>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2).sort_index()
A   -0.041703
B         NaN
C    0.395437
X         NaN
dtype: float64

          

           >>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2, method="kendall").sort_index()
A    0.0
B    NaN
C    0.0
X    NaN
dtype: float64

          

           >>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2.B, method="spearman").sort_index()
A   -0.4
C    0.8
X   -0.2
dtype: float64

          

           >>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df2.corrwith(df1.X).sort_index()
A   -0.597614
B   -0.151186
C   -0.642857
dtype: float64

          

pyspark.pandas.DataFrame.corr

pyspark.pandas.DataFrame.count