dbt-spark
dbt-spark users are recommend to use the Spark connection method when testing. Together with the pytest Spark plugin, a on-the-fly Spark session removes the need for hosting Spark.
Installation
Install dbt-spark, pytest-dbt-core and pytest-spark via pip with
python -m pip install dbt-spark pytest-dbt-core pytest-spark
Configuration
Configure pytest-spark via pytest configuration.
# setup.cfg
[tool:pytest]
spark_options =
spark.executor.instances: 1
spark.sql.catalogImplementation: in-memory
Usage
Use the spark_session fixture to set-up the unit test for your macro:
import pytest
from dbt.clients.jinja import MacroGenerator
from pyspark.sql import SparkSession
@pytest.mark.parametrize(
"macro_generator", ["macro.spark_utils.get_tables"], indirect=True
)
def test_get_tables(
spark_session: SparkSession, macro_generator: MacroGenerator
) -> None:
"""The get tables macro should return the created table."""
expected_table = "default.example"
spark_session.sql(f"CREATE TABLE {expected_table} (id int) USING parquet")
tables = macro_generator()
assert tables == [expected_table]
Test
Run the Pytest via your preferred interface.
pytest