pyspark.pandas.DataFrame.loc ¶

property DataFrame. loc ¶

通过标签或布尔序列访问一组行和列。

.loc[] 主要是基于标签的，但也可以与从 DataFrame 或 Series 派生的条件布尔 Series 一起使用。

允许的输入包括：

单个标签，例如 5 或 'a' ，（注意 5 被解释为索引的标签，并且绝不作为索引沿线的整数位置）用于列选择。
标签的列表或数组，例如 ['a', 'b', 'c'] 。
带有标签的切片对象，例如 'a':'f' 。
从 DataFrame 或 Series 派生的条件布尔 Series。
与被切片的列轴长度相同的布尔数组，例如 [True, False, True] 。
与被切片的列轴对齐的布尔 pandas Series。键的索引将在掩码之前对齐。

不被允许的输入，而 pandas 允许的是：

与被切片行轴长度相同的布尔数组，例如 [True, False, True] 。
一个带有单个参数（调用的 Series、DataFrame 或 Panel）的 callable 函数，并返回有效的索引输出（上述之一）

注意

MultiIndex 尚未支持。

注意

请注意，与通常的Python切片不同， 起始和结束 都包含在内，并且不允许使用切片的步长。

注意

使用行选择的标签列表或数组，pandas-on-Spark 的行为类似于过滤器，不会根据标签重新排序。

另请参阅

Series.loc: 使用标签访问一组值。

示例

获取值

           >>> df = ps.DataFrame([[1, 2], [4, 5], [7, 8]],
...                   index=['cobra', 'viper', 'sidewinder'],
...                   columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

          

单个标签。注意，这会将行作为Series返回。

           >>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

          

标签列表。注意使用 [[]] 返回一个 DataFrame。还要注意，pandas-on-Spark 的行为就像一个过滤器，不会根据标签重新排序。

           >>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

          

           >>> df.loc[['sidewinder', 'viper']]
            max_speed  shield
viper               4       5
sidewinder          7       8

          

列的单个标签。

           >>> df.loc['cobra', 'shield']
2

行标签列表。

           >>> df.loc[['cobra'], 'shield']
cobra    2
Name: shield, dtype: int64

          

列的标签列表。

           >>> df.loc['cobra', ['shield']]
shield    2
Name: cobra, dtype: int64

          

行和列的标签列表。

           >>> df.loc[['cobra'], ['shield']]
       shield
cobra       2

          

按行和单列标签进行切片。请注意，切片的起始和结束都包含在内。

           >>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

          

返回布尔序列的条件

           >>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8

          

返回具有指定列标签的布尔序列的条件

           >>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7

          

与被切片的列轴长度相同的布尔数组。

           >>> df.loc[:, [False, True]]
            shield
cobra            2
viper            5
sidewinder       8

          

一个可对齐的布尔型Series，用于对切片列轴的操作。

           >>> df.loc[:, pd.Series([False, True], index=['max_speed', 'shield'])]
            shield
cobra            2
viper            5
sidewinder       8

          

设置值

为所有匹配标签列表的项目设置值。

           >>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

          

为整行设置值

           >>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

          

为整个列设置值

           >>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

          

为整个列列表设置值

           >>> df.loc[:, ['max_speed', 'shield']] = 100
>>> df
            max_speed  shield
cobra             100     100
viper             100     100
sidewinder        100     100

          

使用 Series 设置值

           >>> df.loc[:, 'shield'] = df['shield'] * 2
>>> df
            max_speed  shield
cobra             100     200
viper             100     200
sidewinder        100     200

          

在具有整数标签索引的DataFrame上获取值

另一个使用整数作为索引的示例

           >>> df = ps.DataFrame([[1, 2], [4, 5], [7, 8]],
...                   index=[7, 8, 9],
...                   columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8

          

使用整数标签对行进行切片。请注意，切片的起始和结束都包含在内。

           >>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8

          

pyspark.pandas.DataFrame.idxmin

pyspark.pandas.DataFrame.iloc