DuckDB + PySpark (dagster-duckdb-pyspark)

This library provides an integration with the DuckDB database and PySpark data processing library.

class dagster_duckdb_pyspark.DuckDBPySparkTypeHandler(*args, **kwds)[source]

Stores PySpark DataFrames in DuckDB.

Note: This type handler can only store outputs. It cannot currently load inputs.

To use this type handler, pass it to build_duckdb_io_manager

Example

from dagster_duckdb import build_duckdb_io_manager
from dagster_duckdb_pyspark import DuckDBPySparkTypeHandler

@asset
def my_table():
    ...

duckdb_io_manager = build_duckdb_io_manager([DuckDBPySparkTypeHandler()])

@repository
def my_repo():
    return with_resources(
        [my_table],
        {"io_manager": duckdb_io_manager.configured({"database": "my_db.duckdb"})}
    )