Connect Streamlit to Google Cloud Storage

Introduction

本指南解释了如何从Streamlit社区云安全地访问Google云存储上的文件。它使用了Streamlit FilesConnection、gcsfs库以及Streamlit的Secrets管理。

Create a Google Cloud Storage bucket and add a file

push_pin

注意

如果您已经有一个想要使用的存储桶，请随时跳到下一步。

首先，注册Google Cloud Platform或登录。前往Google Cloud Storage控制台并创建一个新的存储桶。

导航到您新存储桶的上传部分：

并上传以下包含一些示例数据的CSV文件：

myfile.csv

Enable the Google Cloud Storage API

当您通过Google Cloud控制台或CLI创建项目时，Google Cloud Storage API默认是启用的。您可以随意跳到下一步。

如果您确实需要在项目中启用API以进行编程访问，请前往APIs & Services dashboard（如果需要，请选择或创建一个项目）。搜索Cloud Storage API并启用它。下面的截图有一个蓝色的“管理”按钮，并显示“API已启用”，这意味着不需要采取进一步的操作。这很可能是您的情况，因为API默认是启用的。但是，如果您看到的不是这样，并且有一个“启用”按钮，您需要启用API：

Create a service account and key file

要从Streamlit使用Google Cloud Storage API，您需要一个Google Cloud Platform服务账户（一种用于编程数据访问的特殊类型）。转到服务账户页面并创建一个具有查看者权限的账户。

push_pin

注意

如果按钮CREATE SERVICE ACCOUNT是灰色的，说明您没有正确的权限。请向您的Google Cloud项目管理员寻求帮助。

点击DONE后，您应该会返回到服务账户概览页面。为新账户创建一个JSON密钥文件并下载它：

Add the key to your local app secrets

您的本地 Streamlit 应用程序将从应用程序根目录中的文件 .streamlit/secrets.toml 读取密钥。如果该文件尚不存在，请创建此文件并按照以下方式添加访问密钥：

# .streamlit/secrets.toml

[connections.gcs]
type = "service_account"
project_id = "xxx"
private_key_id = "xxx"
private_key = "xxx"
client_email = "xxx"
client_id = "xxx"
auth_uri = "https://accounts.google.com/o/oauth2/auth"
token_uri = "https://oauth2.googleapis.com/token"
auth_provider_x509_cert_url = "https://www.googleapis.com/oauth2/v1/certs"
client_x509_cert_url = "xxx"

priority_high

重要

将此文件添加到.gitignore中，不要将其提交到你的GitHub仓库！

Copy your app secrets to the cloud

由于上面的secrets.toml文件没有提交到GitHub，你需要将其内容单独传递给你部署的应用程序（在Streamlit社区云上）。转到应用程序仪表板，在应用程序的下拉菜单中，点击编辑Secrets。将secrets.toml的内容复制到文本区域。更多信息可在Secrets管理中找到。

Add FilesConnection and gcsfs to your requirements file

将FilesConnection和gcsfs包添加到您的requirements.txt文件中，最好固定版本（将x.x.x替换为您想要安装的版本）：

# requirements.txt
gcsfs==x.x.x
st-files-connection

Write your Streamlit app

将以下代码复制到您的Streamlit应用程序中并运行它。确保调整您的存储桶和文件的名称。请注意，Streamlit会自动将您的密钥文件中的访问密钥转换为环境变量。

# streamlit_app.py

import streamlit as st
from st_files_connection import FilesConnection

# Create connection object and retrieve file contents.
# Specify input format is a csv and to cache the result for 600 seconds.
conn = st.connection('gcs', type=FilesConnection)
df = conn.read("streamlit-bucket/myfile.csv", input_format="csv", ttl=600)

# Print results.
for row in df.itertuples():
    st.write(f"{row.Owner} has a :{row.Pet}:")

看到上面的st.connection了吗？它处理了密钥检索、设置、结果缓存和重试。默认情况下，read()的结果会被缓存且不会过期。在这个例子中，我们设置了ttl=600以确保文件内容缓存不超过10分钟。你也可以设置ttl=0来禁用缓存。了解更多信息，请访问Caching。

如果一切顺利（并且你使用了上面给出的示例文件），你的应用程序应该看起来像这样：

Previous: Firestore Next: Microsoft SQL Server

forum

还有问题吗？

我们的论坛充满了有用的信息和Streamlit专家。