This page documents the rules for converting Python objects to DuckDB and DuckDB results to Python.
Object Conversion: Python Object to DuckDB
This is a mapping of Python object types to DuckDB Logical Types:
None->NULLbool->BOOLEANdatetime.timedelta->INTERVALstr->VARCHARbytearray->BLOBmemoryview->BLOBdecimal.Decimal->DECIMAL/DOUBLEuuid.UUID->UUID
The rest of the conversion rules are as follows.
int
Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints. Intead we perform these casts in order until one succeeds:
BIGINTINTEGERUBIGINTUINTEGERDOUBLE
When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion.
float
These casts are tried in order until one succeeds:
DOUBLEFLOAT
datetime.datetime
For datetime we will check pandas.isnull if it's available and return NULL if it returns true.
We check against datetime.datetime.min and datetime.datetime.max to convert to -inf and +inf respectively.
If the datetime has tzinfo, we will use TIMESTAMPTZ, otherwise it becomes TIMESTAMP.
datetime.time
If the time has tzinfo, we will use TIMETZ, otherwise it becomes TIME.
datetime.date
date converts to the DATE type.
We check against datetime.date.min and datetime.date.max to convert to -inf and +inf respectively.
bytes
bytes converts to BLOB by default, when it's used to construct a Value object of type BITSTRING, it maps to BITSTRING instead.
list
list becomes a LIST type of the "most permissive" type of its children, for example:
my_list_value = [
12345,
"test"
]
Will become VARCHAR[] because 12345 can convert to VARCHAR but test can not convert to INTEGER.
[12345, test]
dict
The dict object can convert to either STRUCT(...) or MAP(..., ...) depending on its structure.
If the dict has a structure similar to:
my_map_dict = {
"key": [
1, 2, 3
],
"value": [
"one", "two", "three"
]
}
Then we'll convert it to a MAP of key-value pairs of the two lists zipped together.
The example above becomes a MAP(INTEGER, VARCHAR):
{1=one, 2=two, 3=three}
The names of the fields matter and the two lists need to have the same size.
Otherwise we'll try to convert it to a STRUCT.
my_struct_dict = {
1: "one",
"2": 2,
"three": [1, 2, 3],
False: True
}
Becomes:
{'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}
Every
keyof the dictionary is converted to string.
tuple
tuple converts to LIST by default, when it's used to construct a Value object of type STRUCT it will convert to STRUCT instead.
numpy.ndarray and numpy.datetime64
ndarray and datetime64 are converted by calling tolist() and converting the result of that.
Result Conversion: DuckDB Results to Python
DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data.
NumPy
fetchnumpy()fetches the data as a dictionary of NumPy arrays
Pandas
df()fetches the data as a Pandas DataFramefetchdf()is an alias ofdf()fetch_df()is an alias ofdf()fetch_df_chunk(vector_multiple)fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default).
Apache Arrow
arrow()fetches the data as an Arrow tablefetch_arrow_table()is an alias ofarrow()fetch_record_batch(chunk_size)returns an Arrow record batch reader withchunk_sizerows per batch
Polars
pl()fetches the data as a Polars DataFrame
Below are some examples using this functionality. See the Python guides for more examples.
# fetch as Pandas DataFrame
df = con.execute("SELECT * FROM items").fetchdf()
print(df)
# item value count
# 0 jeans 20.0 1
# 1 hammer 42.2 2
# 2 laptop 2000.0 1
# 3 chainsaw 500.0 10
# 4 iphone 300.0 2
# fetch as dictionary of numpy arrays
arr = con.execute("SELECT * FROM items").fetchnumpy()
print(arr)
# {'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
# mask=[False, False, False, False, False],
# fill_value='?',
# dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
# mask=[False, False, False, False, False],
# fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
# mask=[False, False, False, False, False],
# fill_value=999999,
# dtype=int32)}
# fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing
tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
print(tbl.to_pandas())
# item value count
# 0 jeans 20.00 1
# 1 hammer 42.20 2
# 2 laptop 2000.00 1
# 3 chainsaw 500.00 10
# 4 iphone 300.00 2