读取和写入文件#

读取空间数据#

GeoPandas可以读取几乎所有基于矢量的空间数据格式，包括ESRI shapefile，GeoJSON文件等，使用 geopandas.read_file() 命令：

geopandas.read_file(...)

这将返回一个GeoDataFrame对象。这是可能的，因为Geopandas利用了一个名为 GDAL/OGR的大型开源程序，它旨在通过Python包Pyogrio 或Fiona来促进空间数据转换，这两个包都提供了对GDAL的绑定。

传递给 geopandas.read_file() 的任何参数在文件名之后将直接传递给 pyogrio.read_dataframe() 或 fiona.open()，它们负责实际的数据导入。一般来说， geopandas.read_file() 非常智能，应该能够在没有额外参数的情况下完成你想要的操作，但如果需要更多帮助，请输入：

import pyogrio; help(pyogrio.read_dataframe)
import fiona; help(fiona.open)

注意

为了更快地读取数据，当使用默认的pyogrio引擎时，请传递 use_arrow=True。这可以比默认的读取行为快2-4倍，并且适用于所有驱动程序。详情请参见 pyogrio.read_dataframe。

请注意，这需要在您的环境中存在 pyarrow 依赖。

在其他方面，可以使用 driver 关键字显式设置驱动程序（shapefile，GeoJSON），或者使用 layer 关键字从多层文件中选择单个图层：

countries_gdf = geopandas.read_file("package.gpkg", layer='countries')

如果您有一个包含多个图层的文件，您可以使用 geopandas.list_layers()列出它们。请注意，此函数需要 Pyogrio。

GeoPandas 还可以直接从网络 URL 加载资源，例如从 geojson.xyz 加载 GeoJSON 文件：

url = "http://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_land.geojson"
df = geopandas.read_file(url)

您还可以加载包含您数据的ZIP文件：

zipfile = "zip:///Users/name/Downloads/cb_2017_us_state_500k.zip"
states = geopandas.read_file(zipfile)

如果数据集在ZIP文件中的一个文件夹中，您必须附加其名称：

zipfile = "zip:///Users/name/Downloads/gadm36_AFG_shp.zip!data"

如果ZIP文件中的文件夹里有多个数据集，您还必须指定文件名：

zipfile = "zip:///Users/name/Downloads/gadm36_AFG_shp.zip!data/gadm36_AFG_1.shp"

也可以读取任何具有 read() 方法的文件类对象，例如文件句柄（例如通过内置的 open() 函数）或 StringIO：

filename = "test.geojson"
file = open(filename)
df = geopandas.read_file(file)

来自 fsspec 的类文件对象也可以用于读取数据，允许使用该项目支持的任何存储后端和缓存的组合：

path = "simplecache::http://download.geofabrik.de/antarctica-latest-free.shp.zip"
with fsspec.open(path) as file:
    df = geopandas.read_file(file)

您还可以读取路径对象：

import pathlib
path_object = pathlib.Path(filename)
df = geopandas.read_file(path_object)

使用Arrow实现更快的读取#

通过设置环境变量 PYOGRIO_USE_ARROW=1（这还会启用使用arrow写入数据）也可以默认启用此功能。

请注意，这需要在您的环境中存在 pyarrow 依赖。

读取数据的子集#

由于geopandas是由GDAL提供支持的，因此在加载较大的数据集时，您可以利用预过滤。这可以通过几何图形或边界框在地理空间上完成。您还可以使用切片过滤已加载的行。详细信息请参阅 geopandas.read_file()。

几何过滤器#

几何过滤器仅加载与几何体相交的数据。

import geodatasets

gdf_mask = geopandas.read_file(
    geodatasets.get_path("geoda.nyc")
)
gdf = geopandas.read_file(
    geodatasets.get_path("geoda.nyc education"),
    mask=gdf_mask[gdf_mask.name=="Coney Island"],
)

边界框过滤器#

边界框过滤器仅加载与边界框相交的数据。

bbox = (
    1031051.7879884212, 224272.49231459625, 1047224.3104931959, 244317.30894023244
)
gdf = geopandas.read_file(
    geodatasets.get_path("nybb"),
    bbox=bbox,
)

行过滤器#

使用整数（前n行）或切片对象过滤从文件加载的行。

gdf = geopandas.read_file(
    geodatasets.get_path("geoda.nyc"),
    rows=10,
)
gdf = geopandas.read_file(
    geodatasets.get_path("geoda.nyc"),
    rows=slice(10, 20),
)

字段/列过滤器#

使用 columns 关键字从文件中加载部分字段（这需要 pyogrio 或 Fiona 1.9+）：

gdf = geopandas.read_file(
    geodatasets.get_path("geoda.nyc"),
    columns=["name", "rent2008", "kids2000"],
)

跳过从文件加载几何形状：

注意

返回 pandas.DataFrame

pdf = geopandas.read_file(
    geodatasets.get_path("geoda.nyc"),
    ignore_geometry=True,
)

SQL WHERE 过滤器#

在版本 0.12 中新增。

使用SQL WHERE 子句加载数据的一个子集。

注意

需要Fiona 1.9+或pyogrio引擎。

gdf = geopandas.read_file(
    geodatasets.get_path("geoda.nyc"),
    where="subborough='Coney Island'",
)

支持的驱动程序 / 文件格式#

使用pyogrio时，所有由GDAL安装支持的驱动程序都是启用的，您可以使用以下命令检查它们：

import pyogrio; pyogrio.list_drivers()

其中的值表示是否支持特定驱动程序的读取、写入或两者。
Fiona 仅暴露默认子集的驱动程序。要显示这些驱动程序，请输入：

import fiona; fiona.supported_drivers

有一个可用驱动程序列表，默认情况下未暴露，但可能被支持（具体取决于GDAL构建）。您可以通过更新supported_drivers字典在运行时激活这些驱动程序，如下所示：

fiona.supported_drivers["NAS"] = "raw"

写入空间数据#

GeoDataFrames 可以使用 geopandas.GeoDataFrame.to_file() 方法导出为多种不同的标准格式。有关支持的格式的完整列表，请键入 import pyogrio; pyogrio.list_drivers()。

此外，GeoDataFrames 可以通过使用 geopandas.GeoDataFrame.to_postgis() 方法上传到 PostGIS 数据库（从 GeoPandas 0.8 开始）。

注意

为了更快的数据写入，在使用默认的pyogrio引擎时传递 use_arrow=True。这比默认的写入行为快2-4倍，并且适用于所有驱动程序。有关完整细节，请参见 pyogrio.write_dataframe。

请注意，这需要在您的环境中存在 pyarrow 依赖。

注意

GeoDataFrame 可以包含比大多数文件格式支持的更多字段类型。例如，元组或列表可以轻松存储在 GeoDataFrame 中，但将它们保存到例如 GeoPackage 或 Shapefile 会引发 ValueError。在保存到文件之前，需要将它们转换为选定驱动程序支持的格式。

注意

一个GeoDataFrame可以包含多个几何（GeoSeries）列，但大多数标准GIS文件格式，如GeoPackage或ESRI Shapefile，支持的仅是单个几何列。为了存储多个几何列，非活动的GeoSeries需要在保存到文件之前转换为另一种表示形式，如著名文本（WKT）或著名二进制（WKB）。或者，它们可以作为Apache（Geo）Parquet或Feather文件保存，这两者都原生支持多个几何列。

写入Shapefile:

countries_gdf.to_file("countries.shp")

通过 Arrow 写入 Shapefile:

countries_gdf.to_file("countries.shp", use_arrow=True)

写入GeoJSON:

countries_gdf.to_file("countries.geojson", driver='GeoJSON')

写入GeoPackage:

countries_gdf.to_file("package.gpkg", layer='countries', driver="GPKG")
cities_gdf.to_file("package.gpkg", layer='cities', driver="GPKG")

使用多个几何列进行写入:

countries_gdf["country_center"] = countries_gdf["geometry"].centroid
# Line below fails because GeoJSON can't contain multiple geometry columns
# countries_gdf.to_file("countries.geojson", driver='GeoJSON')
countries_gdf["country_center"] = countries_gdf["country_center"].to_wkt()
countries_gdf.to_file("countries.geojson", driver='GeoJSON')

对于多层格式，例如GeoPackage，可以将额外的几何列写入不同的层，而不是将它们保存为WKT或WKB在单个层内。

空间数据库#

GeoPandas 还可以使用 geopandas.read_postgis() 命令从 PostGIS 数据库获取数据。

写入 PostGIS：

from sqlalchemy import create_engine
db_connection_url = "postgresql://myusername:mypassword@myhost:5432/mydatabase";
engine = create_engine(db_connection_url)
countries_gdf.to_postgis("countries_table", con=engine)

Apache Parquet 和 Feather 文件格式#

在版本 0.8.0 中新增。

GeoPandas 支持写入和读取 Apache Parquet (GeoParquet) 和 Feather 文件格式。

Apache Parquet 是一种高效的列式存储格式（源自Hadoop生态系统）。它是一种广泛使用的表格数据的二进制文件格式。Feather文件格式是Apache Arrow内存格式的磁盘表示，后者是一个用于内存列式数据的开放标准。

这些 geopandas.read_parquet(), geopandas.read_feather(), geopandas.GeoDataFrame.to_parquet() 和 geopandas.GeoDataFrame.to_feather() 方法实现了从 Geopandas 到这些二进制文件格式的快速往返，保留了空间信息。

注意

GeoParquet规范在以下位置开发： opengeospatial/geoparquet。

默认情况下，写入文件时使用最新版本，但可以使用 schema_version 关键字指定旧版本。GeoPandas 支持读取使用任何 GeoParquet 版本编码的文件。

读取和写入文件#

读取空间数据#

使用Arrow实现更快的读取#

读取数据的子集#

几何过滤器#

边界框过滤器#

行过滤器#

字段/列过滤器#

SQL WHERE 过滤器#

支持的驱动程序 / 文件格式#

写入空间数据#

空间数据库#

Apache Parquet 和 Feather 文件格式#

这一页