st.cache_data

star

提示

本页面仅包含有关st.cache_data API的信息。要深入了解缓存及其使用方法，请查看Caching。

用于缓存返回数据的函数的装饰器（例如，数据框转换、数据库查询、机器学习推理）。

缓存的对象以“pickled”形式存储，这意味着缓存函数的返回值必须是可pickle的。每个缓存函数的调用者都会获得自己的缓存数据副本。

你可以使用func.clear()清除一个函数的缓存，或者使用st.cache_data.clear()清除整个缓存。

函数的参数必须是可哈希的才能缓存它。如果你有一个不可哈希的参数（如数据库连接）或一个你想从缓存中排除的参数，请在参数名称前使用下划线前缀。在这种情况下，当所有其他参数与之前的函数调用匹配时，Streamlit 将返回一个缓存的值。或者，你可以使用 hash_funcs 声明自定义哈希函数。

要缓存全局资源，请使用st.cache_resource代替。了解更多关于缓存的信息，请访问https://docs.streamlit.io/develop/concepts/architecture/caching。

函数签名[source]
st.cache_data(func=None, *, ttl, max_entries, show_spinner, persist, experimental_allow_widgets, hash_funcs=None)
参数
func (callable)	要缓存的函数。Streamlit 会哈希函数的源代码。
ttl (float, timedelta, str, or None)	缓存中条目的最大保留时间。可以是以下之一： `None` 如果缓存条目永不过期（默认）。一个数字，指定时间（以秒为单位）。一个字符串，指定时间，格式支持 Pandas's Timedelta 构造函数，例如 `"1d"`, `"1.5 days"`, 或 `"1h23s"`。一个 `timedelta` 对象，来自 Python 的内置 datetime 库，例如 `timedelta(days=1)`。请注意，如果 `persist="disk"` 或 `persist=True`，`ttl` 将被忽略。
max_entries (int or None)	缓存中保留的最大条目数，或为无限制缓存设置为None。当向已满的缓存添加新条目时，最旧的缓存条目将被移除。默认为None。
show_spinner (bool or str)	启用加载动画。默认值为True，当出现“缓存未命中”且正在创建缓存数据时显示加载动画。如果为字符串，show_spinner参数的值将用作加载动画的文本。
persist ("disk", bool, 或 None)	可选的位置，用于将缓存数据持久化。传递 "disk"（或 True）将把缓存数据持久化到本地磁盘。None（或 False）将禁用持久化。默认值为 None。
experimental_allow_widgets (bool)	缓存的widget重放功能在1.38版本中被移除。请从您的缓存装饰器中移除`experimental_allow_widgets`参数。此参数将在未来的版本中被移除。允许在缓存函数中使用widgets。默认为False。
hash_funcs (dict or None)	类型或完全限定名称到哈希函数的映射。这用于覆盖Streamlit缓存机制内部的哈希器行为：当哈希器遇到一个对象时，它将首先检查其类型是否与此字典中的键匹配，如果匹配，则使用提供的函数为其生成哈希值。请参阅下面的示例以了解如何使用此功能。

示例

import streamlit as st

@st.cache_data
def fetch_and_clean_data(url):
    # Fetch data from URL here, and then clean it up.
    return data

d1 = fetch_and_clean_data(DATA_URL_1)
# Actually executes the function, since this is the first time it was
# encountered.

d2 = fetch_and_clean_data(DATA_URL_1)
# Does not execute the function. Instead, returns its previously computed
# value. This means that now the data in d1 is the same as in d2.

d3 = fetch_and_clean_data(DATA_URL_2)
# This is a different URL, so the function executes.

要设置persist参数，请按如下方式使用此命令：

import streamlit as st

@st.cache_data(persist="disk")
def fetch_and_clean_data(url):
    # Fetch data from URL here, and then clean it up.
    return data

默认情况下，缓存函数的所有参数都必须是可哈希的。任何以_开头的参数将不会被哈希。你可以使用这个作为不可哈希参数的“逃生舱口”：

import streamlit as st

@st.cache_data
def fetch_and_clean_data(_db_connection, num_rows):
    # Fetch data from _db_connection here, and then clean it up.
    return data

connection = make_database_connection()
d1 = fetch_and_clean_data(connection, num_rows=10)
# Actually executes the function, since this is the first time it was
# encountered.

another_connection = make_database_connection()
d2 = fetch_and_clean_data(another_connection, num_rows=10)
# Does not execute the function. Instead, returns its previously computed
# value - even though the _database_connection parameter was different
# in both calls.

可以程序化地清除缓存函数的缓存：

import streamlit as st

@st.cache_data
def fetch_and_clean_data(_db_connection, num_rows):
    # Fetch data from _db_connection here, and then clean it up.
    return data

fetch_and_clean_data.clear(_db_connection, 50)
# Clear the cached entry for the arguments provided.

fetch_and_clean_data.clear()
# Clear all cached entries for this function.

要覆盖默认的哈希行为，可以传递一个自定义的哈希函数。你可以通过将一个类型（例如 datetime.datetime）映射到一个哈希函数（lambda dt: dt.isoformat()）来实现，如下所示：

import streamlit as st
import datetime

@st.cache_data(hash_funcs={datetime.datetime: lambda dt: dt.isoformat()})
def convert_to_utc(dt: datetime.datetime):
    return dt.astimezone(datetime.timezone.utc)

或者，您可以将类型的完全限定名称（例如 "datetime.datetime"）映射到哈希函数：

import streamlit as st
import datetime

@st.cache_data(hash_funcs={"datetime.datetime": lambda dt: dt.isoformat()})
def convert_to_utc(dt: datetime.datetime):
    return dt.astimezone(datetime.timezone.utc)

priority_high

警告

st.cache_data 隐式使用了 pickle 模块，该模块已知是不安全的。您缓存的函数返回的任何内容都会被序列化并存储，然后在检索时反序列化。请确保您的缓存函数返回可信的值，因为有可能构造恶意的序列化数据，在反序列化期间执行任意代码。切勿以不安全模式加载可能来自不受信任来源的数据，或者可能被篡改的数据。只加载您信任的数据。

st.cache_data.clear

清除所有内存中和磁盘上的数据缓存。

函数签名[source]
st.cache_data.clear()

示例

在下面的示例中，按下“清除所有”按钮将清除所有使用@st.cache_data装饰的函数中的记忆值。

import streamlit as st

@st.cache_data
def square(x):
    return x**2

@st.cache_data
def cube(x):
    return x**3

if st.button("Clear All"):
    # Clear values from *all* all in-memory and on-disk data caches:
    # i.e. clear values from both square and cube
    st.cache_data.clear()

CachedFunc.clear

清除缓存函数的相关缓存。

如果没有传递参数，Streamlit 将清除为该函数缓存的所有值。如果传递了参数，Streamlit 将仅清除这些参数的缓存值。

函数签名[source]
CachedFunc.clear(args, *kwargs)
参数
*args (Any)	缓存函数的参数。
**kwargs (Any)	缓存函数的关键字参数。

示例

import streamlit as st
import time

@st.cache_data
def foo(bar):
    time.sleep(2)
    st.write(f"Executed foo({bar}).")
    return bar

if st.button("Clear all cached values for `foo`", on_click=foo.clear):
    foo.clear()

if st.button("Clear the cached value of `foo(1)`"):
    foo.clear(1)

foo(1)
foo(2)

Using Streamlit commands in cached functions

Static elements

自版本1.16.0以来，缓存的函数可以包含Streamlit命令！例如，你可以这样做：

@st.cache_data
def get_api_data():
    data = api.get(...)
    st.success("Fetched data from API!")  # 👈 Show a success message
    return data

众所周知，Streamlit 只会在之前没有缓存过的情况下运行此函数。在第一次运行时，st.success 消息将出现在应用程序中。但在后续运行中会发生什么？它仍然会出现！Streamlit 意识到在缓存的函数内部有一个 st. 命令，在第一次运行时保存它，并在后续运行中重放它。重放静态元素适用于两种缓存装饰器。

你也可以使用这个功能来缓存你的UI的整个部分：

@st.cache_data
def show_data():
    st.header("Data analysis")
    data = api.get(...)
    st.success("Fetched data from API!")
    st.write("Here is a plot of the data:")
    st.line_chart(data)
    st.write("And here is the raw data:")
    st.dataframe(data)

Input widgets

你也可以在缓存函数中使用交互式输入小部件，比如st.slider或st.text_input。目前，小部件重放是一个实验性功能。要启用它，你需要设置experimental_allow_widgets参数：

@st.cache_data(experimental_allow_widgets=True)  # 👈 Set the parameter
def get_data():
    num_rows = st.slider("Number of rows to get")  # 👈 Add a slider
    data = api.get(..., num_rows)
    return data

Streamlit 将滑块视为缓存函数的额外输入参数。如果您更改滑块位置，Streamlit 将检查是否已经为此滑块值缓存了函数。如果是，它将返回缓存的值。如果不是，它将使用新的滑块值重新运行函数。

在缓存函数中使用小部件非常强大，因为它允许您缓存应用程序的整个部分。但这可能很危险！由于Streamlit将小部件值视为额外的输入参数，它很容易导致内存使用过多。想象一下，您的缓存函数有五个滑块并返回一个100 MB的DataFrame。然后，我们将为这五个滑块值的每个排列添加100 MB到缓存中——即使这些滑块不影响返回的数据！这些添加可能会使您的缓存迅速爆炸。如果您在缓存函数中使用小部件，请注意此限制。我们建议仅在UI的隔离部分使用此功能，其中小部件直接影响缓存的返回值。

priority_high

警告

对缓存函数中小部件的支持目前处于实验阶段。我们可能会随时更改或删除它，恕不另行通知。请谨慎使用！

push_pin

注意

目前有两个小部件在缓存函数中不受支持：st.file_uploader 和 st.camera_input。我们未来可能会支持它们。如果你需要它们，请随时在GitHub上提出问题！

Previous: Caching and state Next: st.cache_resource

forum

还有问题吗？

我们的论坛充满了有用的信息和Streamlit专家。