cudf.core.groupby.groupby.DataFrameGroupBy.corr#
- DataFrameGroupBy.corr(method='pearson', min_periods=1, numeric_only: bool = False)[source]#
计算列的成对相关性,排除NA/空值。
- Parameters:
- method: {“pearson”, “kendall”, “spearman”} or callable,
默认值为“pearson”。目前仅支持皮尔逊相关系数。
- min_periods: int, optional
每对列所需的最小观测数以获得有效结果。
- Returns:
- DataFrame
相关矩阵。
示例
>>> import cudf >>> gdf = cudf.DataFrame({ ... "id": ["a", "a", "a", "b", "b", "b", "c", "c", "c"], ... "val1": [5, 4, 6, 4, 8, 7, 4, 5, 2], ... "val2": [4, 5, 6, 1, 2, 9, 8, 5, 1], ... "val3": [4, 5, 6, 1, 2, 9, 8, 5, 1]}) >>> gdf id val1 val2 val3 0 a 5 4 4 1 a 4 5 5 2 a 6 6 6 3 b 4 1 1 4 b 8 2 2 5 b 7 9 9 6 c 4 8 8 7 c 5 5 5 8 c 2 1 1 >>> gdf.groupby("id").corr(method="pearson") val1 val2 val3 id a val1 1.000000 0.500000 0.500000 val2 0.500000 1.000000 1.000000 val3 0.500000 1.000000 1.000000 b val1 1.000000 0.385727 0.385727 val2 0.385727 1.000000 1.000000 val3 0.385727 1.000000 1.000000 c val1 1.000000 0.714575 0.714575 val2 0.714575 1.000000 1.000000 val3 0.714575 1.000000 1.000000