mars.dataframe.DataFrame.apply#

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), dtypes=None, dtype=None, name=None, output_type=None, index=None, elementwise=None, skip_infer=False, **kwds)#

在DataFrame的一个轴上应用一个函数。

传递给函数的对象是Series对象，其索引是DataFrame的索引 (axis=0) 或 DataFrame的列 (axis=1)。默认情况下 (result_type=None)，最终返回类型是从应用函数的返回类型推断出来的。否则，它依赖于result_type参数。

Parameters

func (function) – 应用于每一列或行的函数。
axis ({0或'index', 1或'columns'}, 默认 0) –
函数应用的轴：
- 0或‘index’：对每一列应用函数。
- 1或‘columns’：对每一行应用函数。
raw (bool, default False) –
确定行还是列作为一个 Series 或 ndarray 对象被传递：
- False : 将每一行或列作为一个 Series 传递给函数。
- True : 传递的函数将接收 ndarray 对象。如果您只是应用 NumPy 减少函数，这将实现更好的性能。
result_type ({'expand', 'reduce', 'broadcast', None}, default None) –
这些仅在 axis=1（列）时起作用：
- ’expand’ : 类似列表的结果将被转换为列。
- ’reduce’ : 如果可能则返回一个 Series，而不是展开类似列表的结果。这是 ‘expand’ 的相反。
- ’broadcast’ : 结果将被广播到原始形状的 DataFrame，原始索引和列将被保留。
默认行为（None）取决于应用函数的返回值：类似列表的结果将作为这些的 Series 返回。然而，如果应用函数返回一个 Series，则这些会被展开为列。
output_type ({'dataframe', 'series'}, 默认值 None) – 指定返回对象的类型。详见说明。
dtypes (Series, 默认值为 None) – 指定返回的 DataFrames 的数据类型。有关更多详细信息，请参见说明。
dtype (numpy.dtype, 默认为 None) – 指定返回 Series 的数据类型。有关更多详细信息，请参见备注。
name (str, default None) – 指定返回的 Series 的名称。有关更多详细信息，请参见 Notes。
index (Index, 默认 None) – 指定返回对象的索引。有关更多详细信息，请参见说明。
elementwise (bool, 默认值为 False) –
指定 func 是否为逐元素函数：
- False : 该函数不是逐元素的。Mars 会尝试在行中连接块（当 axis=0 时）或在列中连接（当 axis=1 时），然后将 func 应用到连接后的块上。连接步骤可能会导致额外的延迟。
- True : 该函数是逐元素的。Mars 将对原始块应用 func。这不会引入额外的连接步骤，从而减少开销。
skip_infer (bool, 默认值为 False) – 当未指定 dtypes 或 output_type 时，是否推断数据类型。
args (tuple) – 除了数组/系列之外，传递给 func 的位置参数。
**kwds – 作为关键字参数传递给func的附加关键字参数。

Returns

在给定的DataFrame轴上应用func的结果。

Return type

序列或数据框

另请参阅

DataFrame.applymap: 用于逐元素操作。
DataFrame.aggregate: 仅执行聚合类型操作。
DataFrame.transform: 仅执行类型转换操作。

备注

在决定输出数据类型和返回值的形状时，Mars 将尝试将 func 应用到一个模拟的 DataFrame 上，而这个 apply 调用可能会失败。当这种情况发生时，您需要在 output_type 中指定 apply 调用的类型（DataFrame 或 Series）。

对于DataFrame输出，您需要指定一个列表或一个pandas系列作为 dtypes 的输出DataFrame。index 的输出也可以被指定。
对于系列输出，您需要指定 dtype 和 name 的输出系列。

示例

>>> import numpy as np
>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df.execute()
   A  B
0  4  9
1  4  9
2  4  9

在任一轴上使用缩减函数

>>> df.apply(np.sum, axis=0).execute()
A    12
B    27
dtype: int64

>>> df.apply(np.sum, axis=1).execute()
0    13
1    13
2    13
dtype: int64

返回一个类似列表的结果将导致一个系列

>>> df.apply(lambda x: [1, 2], axis=1).execute()
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

传递 result_type='expand' 将会把类列表的结果展开为 Dataframe 的列

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand').execute()
1
1  2
1  2
1  2

在函数内返回一个序列类似于传递 result_type='expand'。生成的列名称将是序列索引。

>>> df.apply(lambda x: md.Series([1, 2], index=['foo', 'bar']), axis=1).execute()
   foo  bar
0    1    2
1    1    2
2    1    2

传递 result_type='broadcast' 将确保无论返回的是列表类型还是标量，结果都具有相同的形状，并沿着轴进行广播。结果的列名将保持原样。

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast').execute()
   A  B
0  1  2
1  1  2
2  1  2