Ephemeral vs. Incremental Models in dbt: When to Use Each
In modern data transformation tools like dbt (data build tool), ephemeral and incremental models are two powerful materialisation strategies that help balance performance, cost, and simplicity. Understanding their differences is key to building efficient data pipelines.
Ephemeral Models
Ephemeral Models are not materialised as physical tables or views in your database. Instead, they are inlined as Common Table Expressions (CTEs) wherever they are referenced.
Key Characteristics:
No storage: No table or view is created.
Recomputed on every run: The logic is executed every time the model is referenced.
Great for reusable logic: Ideal for intermediate transformations.
When to Use:
Simple transformations or filtering that will be reused.
You want to avoid cluttering your schema with intermediate tables.
The computation is lightweight and fast.
Example (models/stg_orders.sql):
-- config: materialized=’ephemeral’
SELECT
order_id,
customer_id,
order_date,
status,
amount
FROM {{ source(’raw’, ‘orders’) }}
WHERE status != ‘cancelled’This model will be inlined into any downstream model that references {{ ref(’stg_orders’) }}.
Incremental Models
Incremental models build on data over time by processing only new or changed records on each run. They maintain a complete history in a table but avoid reprocessing old data.
Key Characteristics:
Materialised as a table.
Only new data is appended (or updated) on each run.
Requires a unique key and incremental predicate (usually based on a timestamp).
When to Use:
Large datasets where a full refresh is expensive.
You need historical data but want fast daily updates.
Event/streaming data (e.g., logs, transactions).
Example (models/fact_orders.sql):
-- config: materialized=’incremental’, unique_key=’order_id’
SELECT
order_id,
customer_id,
order_date,
amount,
CURRENT_TIMESTAMP() AS processed_at
FROM {{ ref(’stg_orders’) }}
{% if is_incremental() %}
-- Only process new records
WHERE order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}On first run: full load.
On subsequent runs: only new order_date records are inserted.
Best Practice Tip
Use ephemeral models as the foundation of your staging layer (stg_*) to keep logic clean and reusable. Then use incremental models in your mart layer for large fact tables that grow daily.
sources
└── stg_orders (ephemeral)
└── fact_daily_orders (incremental)Summary
Ephemeral = logic reuse, no storage
Incremental = performance at scale, full history
Choose ephemeral for simplicity and incremental for efficiency on large, append-only datasets. Use them together to build clean, fast, and scalable data pipelines.
