dask_expr._collection.Series

dask_expr._collection.Series¶

class dask_expr._collection.Series(expr)[源代码]¶

类似序列的表达式集合。

构造函数接受表示查询的表达式作为输入。该类不旨在直接实例化。相反，请使用 Dask 中的一个 IO 连接器。

__init__(expr)¶

方法

`__init__`(expr)
`abs`()	返回一个包含每个元素绝对数值的 Series/DataFrame。
`add`(other[, level, fill_value, axis])
`add_prefix`(prefix)	使用字符串 prefix 作为标签前缀。
`add_suffix`(suffix)	使用字符串 suffix 作为后缀标签。
`align`(other[, join, axis, fill_value])	使用指定的连接方法将两个对象沿其轴对齐。
`all`([axis, skipna, split_every])	返回是否所有元素都为 True，可能是在某个轴上。
`analyze`([filename, format])	输出表达式中每个节点的统计信息。
`any`([axis, skipna, split_every])	返回是否任何元素为 True，可能在某个轴上。
`apply`(function, *args[, meta, axis])	pandas.Series.apply 的并行版本
`astype`(dtypes)	将 pandas 对象转换为指定的数据类型 `dtype`。
`autocorr`([lag, split_every])	计算滞后N的自相关。
`between`(left, right[, inclusive])	返回布尔序列，等价于 left <= 序列 <= right。
`bfill`([axis, limit])	使用下一个有效观测值来填充NA/NaN值。
`case_when`(caselist)	在条件为真时替换值。
`clear_divisions`()	忘记分割信息。
`clip`([lower, upper, axis])	在输入阈值处修剪值。
`combine`(other, func[, fill_value])	根据 func 将 Series 与 Series 或标量组合。
`combine_first`(other)	使用 other 中相同位置的值更新空元素。
`compute`([fuse, concatenate])	计算这个 DataFrame。
`compute_current_divisions`([col, set_divisions])	计算DataFrame的当前分区。
`copy`([deep])	复制数据框
`corr`(other[, method, min_periods, split_every])	计算与 other Series 的相关性，排除缺失值。
`count`([axis, numeric_only, split_every])	计算每列或每行的非NA单元格数量。
`cov`(other[, min_periods, split_every])	计算与 Series 的协方差，排除缺失值。
`cummax`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积最大值。
`cummin`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积最小值。
`cumprod`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积乘积。
`cumsum`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积和。
`describe`([split_every, percentiles, ...])	生成描述性统计数据。
`diff`([periods, axis])	元素的第一次离散差分。
`div`(other[, level, fill_value, axis])
`divide`(other[, level, fill_value, axis])
`dot`(other[, meta])	计算 Series 与 other 列之间的点积。
`drop_duplicates`([ignore_index, split_every, ...])
`dropna`()	返回一个移除了缺失值的新序列。
`enforce_runtime_divisions`()	在运行时强制执行当前分区。
`eq`(other[, level, fill_value, axis])
`explain`([stage, format])	创建表达式的图形表示。
`explode`()	将类似列表的每个元素转换为一行。
`ffill`([axis, limit])	通过将最后一个有效观测值传播到下一个有效值来填充 NA/NaN 值。
`fillna`([value, axis])	使用指定方法填充 NA/NaN 值。
`floordiv`(other[, level, fill_value, axis])
`from_dict`(data, *[, npartitions, orient, ...])	从 Python 字典构建 Dask DataFrame
`ge`(other[, level, fill_value, axis])
`get_partition`(n)	获取表示第 nth 分区的 dask DataFrame/Series。
`groupby`(by, **kwargs)	使用映射器或通过一系列列来分组系列。
`gt`(other[, level, fill_value, axis])
`head`([n, npartitions, compute])	数据集的前 n 行
`idxmax`([axis, skipna, numeric_only, split_every])	返回请求轴上最大值的第一个出现的索引。
`idxmin`([axis, skipna, numeric_only, split_every])	返回请求轴上最小值的首次出现的索引。
`isin`(values)	DataFrame 中的每个元素是否包含在值中。
`isna`()	检测缺失值。
`isnull`()	DataFrame.isnull 是 DataFrame.isna 的别名。
`kurt`([axis, fisher, bias, nan_policy, ...])	返回请求轴上的无偏峰度。
`kurtosis`([axis, fisher, bias, nan_policy, ...])	返回请求轴上的无偏峰度。
`le`(other[, level, fill_value, axis])
`lower_once`()
`lt`(other[, level, fill_value, axis])
`map`(arg[, na_action, meta])	根据输入的映射或函数映射 Series 的值。
`map_overlap`(func, before, after, *args[, ...])	对每个分区应用一个函数，与相邻分区共享行。
`map_partitions`(func, *args[, meta, ...])	将一个Python函数应用于每个分区
`mask`(cond[, other])	替换条件为 True 的值。
`max`([axis, skipna, numeric_only, split_every])	返回请求轴上的值的最大值。
`mean`([axis, skipna, numeric_only, split_every])	返回请求轴上值的平均值。
`median`()	返回请求轴上值的中位数。
`median_approximate`([method])	返回请求轴上值的近似中位数。
`memory_usage`([deep, index])	返回 Series 的内存使用情况。
`memory_usage_per_partition`([index, deep])	返回每个分区的内存使用情况
`min`([axis, skipna, numeric_only, split_every])	返回请求轴上值的最小值。
`mod`(other[, level, fill_value, axis])
`mode`([dropna, split_every])	返回 Series 的模式。
`mul`(other[, level, fill_value, axis])
`ne`(other[, level, fill_value, axis])
`nlargest`([n, split_every])	返回最大的 n 个元素。
`notnull`()	DataFrame.notnull 是 DataFrame.notna 的别名。
`nsmallest`([n, split_every])	返回最小的 n 个元素。
`nunique`([dropna, split_every, split_out])	返回对象中唯一元素的数量。
`nunique_approx`([split_every])	唯一行的近似数量。
`optimize`([fuse])	优化 DataFrame。
`persist`([fuse])	将此 dask 集合持久化到内存中
`pipe`(func, args, *kwargs)	应用期望 Series 或 DataFrame 的可链式函数。
`pow`(other[, level, fill_value, axis])
`pprint`()	输出 DataFrame 的字符串表示形式。
`prod`([axis, skipna, numeric_only, ...])	返回请求轴上值的乘积。
`product`([axis, skipna, numeric_only, ...])	返回请求轴上值的乘积。
`quantile`([q, method])	Series 的近似分位数
`radd`(other[, level, fill_value, axis])
`random_split`(frac[, random_state, shuffle])	伪随机地将数据框按行分割成不同的部分
`rdiv`(other[, level, fill_value, axis])
`reduction`(chunk[, aggregate, combine, meta, ...])	通用行级归约。
`rename`(index[, sorted_index])	修改系列索引标签或名称
`rename_axis`([mapper, index, columns, axis])	设置索引或列的轴名称。
`repartition`([divisions, npartitions, ...])	重新分配一个集合
`replace`([to_replace, value, regex])	将 to_replace 中的值替换为 value。
`resample`(rule[, closed, label])	重采样时间序列数据。
`reset_index`([drop])	将索引重置为默认索引。
`rfloordiv`(other[, level, fill_value, axis])
`rmod`(other[, level, fill_value, axis])
`rmul`(other[, level, fill_value, axis])
`rolling`(window, **kwargs)	提供滚动变换功能。
`round`([decimals])	将 DataFrame 四舍五入到可变的小数位数。
`rpow`(other[, level, fill_value, axis])
`rsub`(other[, level, fill_value, axis])
`rtruediv`(other[, level, fill_value, axis])
`sample`([n, frac, replace, random_state])	随机样本项
`sem`([axis, skipna, ddof, split_every, ...])	返回请求轴上的无偏标准误差。
`shift`([periods, freq, axis])	通过可选的时间 freq 将索引按所需周期数进行移位。
`shuffle`([on, ignore_index, npartitions, ...])	将 DataFrame 重新排列为新的分区
`simplify`()
`skew`([axis, bias, nan_policy, numeric_only])	返回请求轴上的无偏斜度。
`squeeze`()	将一维轴对象压缩为标量。
`std`([axis, skipna, ddof, numeric_only, ...])	返回请求轴上的样本标准差。
`sub`(other[, level, fill_value, axis])
`sum`([axis, skipna, numeric_only, min_count, ...])	返回请求轴上值的总和。
`tail`([n, compute])	数据集的最后 n 行
`to_backend`([backend])	切换到新的 DataFrame 后端
`to_bag`([index, format])	从 Series 创建一个 Dask Bag
`to_csv`(filename, **kwargs)	更多信息请参阅 dd.to_csv 的文档字符串
`to_dask_array`([lengths, meta, optimize])	将 dask DataFrame 转换为 dask 数组。
`to_dask_dataframe`(args, *kwargs)	转换为旧版 dask-dataframe 集合
`to_delayed`([optimize_graph])	转换为一个 `dask.delayed` 对象列表，每个分区一个。
`to_frame`([name])	将 Series 转换为 DataFrame。
`to_hdf`(path_or_buf, key[, mode, append])	更多信息请参见 dd.to_hdf 的文档字符串
`to_json`(filename, args, *kwargs)	更多信息请参见 dd.to_json 的文档字符串
`to_legacy_dataframe`([optimize])	转换为旧版 dask-dataframe 集合
`to_orc`(path, args, *kwargs)	更多信息请参见 dd.to_orc 的文档字符串
`to_records`([index, lengths])
`to_sql`(name, uri[, schema, if_exists, ...])
`to_string`([max_rows])	渲染 Series 的字符串表示。
`to_timestamp`([freq, how])	将时间戳转换为 DatetimeIndex，位于周期的开始。
`truediv`(other[, level, fill_value, axis])
`unique`([split_every, split_out, shuffle_method])	返回对象中的唯一值序列。
`value_counts`([sort, ascending, dropna, ...])	返回一个包含唯一值计数的系列。
`var`([axis, skipna, ddof, numeric_only, ...])	返回请求轴上的无偏方差。
`visualize`([tasks])	可视化表达式或任务图
`where`(cond[, other])	替换条件为 False 的值。

属性

`axes`
`columns`
`dask`
`divisions`	`npartitions + 1` 值的元组，按升序排列，标记每个分区索引的下限/上限。
`dtype`
`dtypes`	返回数据类型
`expr`
`index`	返回 dask 索引实例
`is_monotonic_decreasing`	如果对象中的值是单调递减的，则返回布尔值。
`is_monotonic_increasing`	如果对象中的值是单调递增的，则返回布尔值。
`known_divisions`	是否已知分区。
`loc`	纯标签位置索引器，用于按标签选择。
`name`
`nbytes`	字节数
`ndim`	返回维度
`npartitions`	返回分区数量
`partitions`	按分区切片数据框
`shape`	返回一个表示 DataFrame 维度的元组。
`size`	Series 或 DataFrame 的大小作为 Delayed 对象。
`values`	返回此数据框值的 dask.array

dask_expr._collection.DataFrame.where

dask_expr._collection.Series.add