测试#

testing 模块提供了许多用于单元测试的函数和辅助工具。

注意

默认情况下，testing 模块不会被导入，以优化主 polars 模块的导入速度。你可以导入 polars.testing 然后使用该命名空间，或者从完整模块路径中导入你需要的特定函数，例如：

from polars.testing import assert_frame_equal, assert_series_equal

断言#

Polars 提供了一些标准的断言用于单元测试：

`testing.assert_frame_equal`(left, right, *[, ...])	断言左侧和右侧的框架是相等的。
`testing.assert_frame_not_equal`(left, right, *)	断言左侧和右侧的框架不相等。
`testing.assert_series_equal`(left, right, *)	断言左侧和右侧的Series相等。
`testing.assert_series_not_equal`(left, right, *)	断言左侧和右侧的Series不相等。

参数测试#

有关基于属性的测试、策略和库集成的更多详细信息，请参阅 Hypothesis 库：

Polars 策略#

Polars 提供了以下 hypothesis 测试策略：

`testing.parametric.dataframes`([cols, lazy, ...])	用于生成Polars DataFrames或LazyFrames的假设策略。
`testing.parametric.dtypes`(*[, ...])	创建一个用于生成Polars `DataType`对象的策略。
`testing.parametric.lists`(inner_dtype, *[, ...])	创建一个策略，用于生成给定数据类型的列表。
`testing.parametric.series`(*[, name, dtype, ...])	用于生成Polars系列的假设策略。

策略助手#

`testing.parametric.column`([name, dtype, ...])	定义一个列以用于`dataframes`策略。
`testing.parametric.columns`([cols, dtype, ...])	定义多个列以与@dataframes策略一起使用。
`testing.parametric.create_list_strategy`([...])	创建一个用于生成Polars `List`数据的策略。

配置文件#

提供了几个标准/命名的 hypothesis 配置文件：

fast: 运行100次迭代。
balanced: 运行1,000次迭代。
expensive: 运行10,000次迭代。

加载/设置辅助函数允许您直接访问这些配置文件，设置您偏好的配置文件（默认为fast），或设置自定义的迭代次数。

`testing.parametric.load_profile`([profile, ...])	加载一个命名（或自定义）的假设配置文件，用于参数测试。
`testing.parametric.set_profile`(profile)	将环境变量 `POLARS_HYPOTHESIS_PROFILE` 设置为给定的配置文件名称/值。

近似配置文件时间：

在具有12核的机器上，使用xdist -n auto对0.17.6版本的polars进行参数化单元测试，针对发布版和调试版构建，得到以下时间（这些值仅供参考，可能会根据您的硬件设置而有显著差异）：

个人资料	迭代	发布	调试
`fast`	100	~6 秒	~8 秒
`balanced`	1,000	~22 秒	约30秒
`expensive`	10,000	~3 分钟 5 秒	约4分钟45秒

示例#

基础： 创建一个参数化单元测试，该测试将接收一系列生成的DataFrame，每个DataFrame有5个数值列，每个生成的值有10%的几率是null（这与NaN不同）。

import polars as pl
from polars.testing.parametric import dataframes
from polars import NUMERIC_DTYPES

from hypothesis import given

@given(
    dataframes(
        cols=5,
        allow_null=True,
        allowed_dtypes=NUMERIC_DTYPES,
    )
)
def test_numeric(df: pl.DataFrame):
    assert all(df[col].dtype.is_numeric() for col in df.columns)

    # Example frame:
    # ┌──────┬────────┬───────┬────────────┬────────────┐
    # │ col0 ┆ col1   ┆ col2  ┆ col3       ┆ col4       │
    # │ ---  ┆ ---    ┆ ---   ┆ ---        ┆ ---        │
    # │ u8   ┆ i16    ┆ u16   ┆ i32        ┆ f64        │
    # ╞══════╪════════╪═══════╪════════════╪════════════╡
    # │ 54   ┆ -29096 ┆ 485   ┆ 2147483647 ┆ -2.8257e14 │
    # │ null ┆ 7508   ┆ 37338 ┆ 7264       ┆ 1.5        │
    # │ 0    ┆ 321    ┆ null  ┆ 16996      ┆ NaN        │
    # │ 121  ┆ -361   ┆ 63204 ┆ 1          ┆ 1.1443e235 │
    # └──────┴────────┴───────┴────────────┴────────────┘

中级： 将假设原生的策略集成到特定命名的列中，生成一系列LazyFrames，最小行数为五行，并且值符合给定的策略：

import polars as pl
from polars.testing.parametric import column, dataframes

import hypothesis.strategies as st
from hypothesis import given
from string import ascii_letters, digits

id_chars = ascii_letters + digits

@given(
    dataframes(
        cols=[
            column("id", strategy=st.text(min_size=4, max_size=4, alphabet=id_chars)),
            column("ccy", strategy=st.sampled_from(["GBP", "EUR", "JPY", "USD"])),
            column("price", strategy=st.floats(min_value=0.0, max_value=1000.0)),
        ],
        min_size=5,
        lazy=True,
    )
)
def test_price_calculations(lf: pl.LazyFrame):
    ...
    print(lf.collect())

    # Example frame:
    # ┌──────┬─────┬─────────┐
    # │ id   ┆ ccy ┆ price   │
    # │ ---  ┆ --- ┆ ---     │
    # │ str  ┆ str ┆ f64     │
    # ╞══════╪═════╪═════════╡
    # │ A101 ┆ GBP ┆ 1.1     │
    # │ 8nIn ┆ JPY ┆ 1.5     │
    # │ QHoO ┆ EUR ┆ 714.544 │
    # │ i0e0 ┆ GBP ┆ 0.0     │
    # │ 0000 ┆ USD ┆ 999.0   │
    # └──────┴─────┴─────────┘

高级： 创建并使用一个 List[UInt8] 数据类型策略作为假设 composite 生成小整数值对的策略，其中每个嵌套对中的第一个值总是小于或等于第二个值：

import polars as pl
from polars.testing.parametric import column, dataframes, lists

import hypothesis.strategies as st
from hypothesis import given

@st.composite
def uint8_pairs(draw: st.DrawFn):
    uints = lists(pl.UInt8, size=2)
    pairs = list(zip(draw(uints), draw(uints)))
    return [sorted(ints) for ints in pairs]

@given(
    dataframes(
        cols=[
            column("colx", strategy=uint8_pairs()),
            column("coly", strategy=uint8_pairs()),
            column("colz", strategy=uint8_pairs()),
        ],
        min_size=3,
        max_size=3,
    )
)
def test_miscellaneous(df: pl.DataFrame): ...

    # Example frame:
    # ┌─────────────────────────┬─────────────────────────┬──────────────────────────┐
    # │ colx                    ┆ coly                    ┆ colz                     │
    # │ ---                     ┆ ---                     ┆ ---                      │
    # │ list[list[i64]]         ┆ list[list[i64]]         ┆ list[list[i64]]          │
    # ╞═════════════════════════╪═════════════════════════╪══════════════════════════╡
    # │ [[143, 235], [75, 101]] ┆ [[143, 235], [75, 101]] ┆ [[31, 41], [57, 250]]    │
    # │ [[87, 186], [174, 179]] ┆ [[87, 186], [174, 179]] ┆ [[112, 213], [149, 221]] │
    # │ [[23, 85], [7, 86]]     ┆ [[23, 85], [7, 86]]     ┆ [[22, 255], [27, 28]]    │
    # └─────────────────────────┴─────────────────────────┴──────────────────────────┘