ClickHouse table engines and CDC

Published Wed, October 15, 2025 ∙ OLAP, ClickHouse ∙ by Johanan Ottensooser

In an OLAP (analytical) database, there are no updates, just merges. Every transaction is just writing a new row.

This is fundamental to the performance of OLAP databases, and very different from the update paradigm in OLTP (transactional) databases. So, you have to make certain design decisions related to this, especially when you are trying to capture data that includes updates, like CDC (change data capture) from your OLTP database. In ClickHouse, the key design choice is the table engine (plus your sorting key and partitions). In other OLAP systems you’ll tune partitioning, clustering, and dedup/compaction instead of picking an engine.

This article will show you the effect of which ClickHouse table engine you decide to use, in this case, when you are ingesting CDC data. You can capture changes by reading your OLTP write ahead log (WAL) (e.g., Postgres WAL via Debezium) and streaming those events into ClickHouse using Redpanda. The table engine you pick determines how inserts/updates/deletes are reconciled into a queryable, up‑to‑date view without constantly doing deduplication at query time.

TL;DR:
  • CDC data is append-only event data, not a simple snapshot of state.
  • ClickHouse handles this best with the right table engine (Replacing, Collapsing, or VersionedCollapsing).
  • Choosing the wrong engine can lead to double-counting or stale rows in your queries.

Recommendation:
When ingesting CDC into ClickHouse, model updates and deletes explicitly and pick a table engine that matches your CDC semantics. MooseStack can handle this logic for you automatically.

We receive data in a form something like the below:

opidnameemailbalancebefore.balancesource.lsn (might be in a hex pair)*ts_msnotes
c1Alicealice@example.com100.00null1000010002025-10-14T14:00:01Zinitial insert
u1Alicealice@example.com125.00100.001000015002025-10-14T14:00:10Zbalance update
u1Alicealice@example.com150.00125.001000020002025-10-14T14:00:20Zanother update
d1nullnullnull150.001000025002025-10-14T14:00:30Zdelete event

*LSN is Postgres’ Log Sequence Number: a strictly increasing WAL position. Perfect as a monotonic ver for ClickHouse engines to order updates.

If you want to run a query to sum the balance, you can very easily foot-gun:

This returns 375 (even though the final balance is 0 after the delete) because you summed every historical value.

So, how can we turn the above into a set of data that is queryable?

First thing, we use a streaming function to tidy up the data a little (the final form of this tidying may depend on the table engine chosen, as you’ll see below), giving us:

idnameemailbalancever (from lsn)ts_msis_deleted
1Alicealice@example.com100.001000010002025-10-14T14:00:01Z0
1Alicealice@example.com125.001000015002025-10-14T14:00:10Z0
1Alicealice@example.com150.001000020002025-10-14T14:00:20Z0
1Alicealice@example.com150.001000025002025-10-14T14:00:30Z1

All we are doing is grabbing the relevant columns, creating the ver column, and explicitly calling out row deletes.

Then, we’ll write them to ClickHouse.

Landing CDC data in ClickHouse

The behaviour of a ClickHouse table is defined by the engine used. We’ll compare the following engines and their behaviour for writing the above CDC in (and how you’d query from them):

MergeTree is a strong default, but not for CDC

MergeTree is ClickHouse’s default table engine, and it provides a great balance of fast inserts and optimized queries. However, it does not update rows automatically.

  • So, inserts are fine, a new row is just created.
  • Updates will be written as new rows, so you’ll have to deduplicate these rows in your consumption queries by selecting the max ver or latest ts_ms per id.
  • Deletes are not dealt with automatically. Either you need to run a slow ALTER DELETE query, or you need to add logic around the is_deleted column in your query.
ProsCons
Fastest writesSlow reads if you want to deduplicate your data
Easy to make mistakes in your queries or subqueries
Manual deletes if you need real deletes

Moose OLAP support

Any OLAP table created with Moose OLAP by default uses MergeTree. Just create your OlapTable using defaults:

ReplacingMergeTree is strong for most CDC use-cases

ReplacingMergeTree is becoming the community standard for use in CDC. Essentially, it extends the MergeTree engine: duplicate rows are cleaned up on merge. It always keeps the newest version of a row, discarding the older duplicates during background merges.

ReplacingMergeTree deduplicates by the ORDER BY (sorting) key. Among rows with the same sorting key, it keeps the highest ver (or the last inserted if no ver is configured). If you also pass an is_deleted column, a delete is represented as a tombstone (latest version with is_deleted=1). Queries then filter WHERE is_deleted=0 for active rows; physical cleanup of tombstones requires OPTIMIZE … FINAL CLEANUP with the appropriate setting enabled.

  • Inserts are just inserts
  • Updates are inserted as duplicate rows initially. On a merge, the rules above dictate which rows are deleted. This means that if you query before a merge, there is a chance you see duplicate data. Use SELECT … FINAL to deduplicate on read (expensive), or OPTIMIZE TABLE … FINAL to trigger a background merge.
  • Deletes are signalled with the is_deleted flag. On a merge, all previous versions are deleted. In order not to see the deleted row, you must either filter for this flag in your query, or you can run OPTIMIZE … FINAL CLEANUP, which gets rid of these tombstone rows.

Safe query:

ProsCons
Fast writesStale versions are visible and queryable until a merge occurs
Very easy implementationNeed to be aware of tombstone approach / have appropriate fields in your ingested data
There are edge cases where you need to understand how your data is partitioned to ensure “finality” of your data

Moose OLAP support:

To set up an OlapTable with ReplacingMergeTree, we use ClickHouseEngines.ReplacingMergeTree:

This requires an ORDER BY to be specified, and has optional ver and isDeleted parameters for controlling deduplication with more granularity.

CollapsingMergeTree

CollapsingMergeTree uses a Sign column (±1) to cancel rows during merges (in what feels a little bit like double entry accounting).

  • Inserts are inserts, and have a sign = 1 (this is a live row)
  • When you get an update, you create two rows. A negative row (with sign -1) with the same data as the previous version of the row, effectively cancelling out the row mathematically. A positive row representing the new row.
  • Deletes just create a negative row.

On merge, pairs of positive and negative rows are removed.

This is easier explained with a worked example (I’ll use the same data as above).

First transaction (create) creates the row below:

idnameemailbalancever (from lsn)ts_mssign
1Alicealice@example.com100.001000010002025-10-14T14:00:01Z1

Second transaction (update) creates the second and third rows below:

idnameemailbalancever (from lsn)ts_mssign
1Alicealice@example.com100.001000010002025-10-14T14:00:01Z1
1Alicealice@example.com100.001000010002025-10-14T14:00:10Z-1
1Alicealice@example.com125.001000015002025-10-14T14:00:10Z1

At this moment, if you sum the balance, taking into account the sign, you get the correct balance of 125 (the first and second row cancel).

Fourth transaction (delete) creates the fifth and sixth rows below:

idnameemailbalancever (from lsn)ts_mssign
1Alicealice@example.com100.001000010002025-10-14T14:00:01Z1
1Alicealice@example.com100.001000010002025-10-14T14:00:10Z-1
1Alicealice@example.com125.001000015002025-10-14T14:00:10Z1
1Alicealice@example.com125.001000015002025-10-14T14:00:20Z-1
1Alicealice@example.com150.001000020002025-10-14T14:00:20Z1
1Alicealice@example.com150.001000025002025-10-14T14:00:30Z-1

For aggregates, computing with sign (e.g., SUM(balance * sign)) yields the correct result immediately. For row-level views, you’ll still need FINAL (or a GROUP BY that encodes the collapsing) until background merges complete.

Safe query:

ProsCons
Explicit deletesComplex to use / query
Immediately correct for certain queries like SUM(value*Sign)Only immediately correct for certain queries, not arbitrary SELECT *s
Shifts complexity onto CDC / data pipeline / preparation layer (you need to know / query for the previous version to be able to write the negative row)
“Immediate correctness” requires you don’t have bad luck with partitions
Engine doesn’t track global version, which can lead to inconsistency

VersionedCollapsingMergeTree

This engine extends the CollapsingMergeTree engine to solve that final con on the list, adding a version column to the collapsing algorithm to allow the engine to track global version.

Moose OLAP support

Given the flexibility of ReplacingMergeTree, we haven’t seen demand for support for CollapsingMergeTree or VersionedCollapsingMergeTree. Interested in having it supported? Let us know.

Recommendation

For almost every CDC use-case, ReplacingMergeTree will be the perfect balance of fast write speed, low write complexity and correctness (albeit eventual correctness). For most analytical use-cases, the potential temporary incorrectness is marginal compared to the size of data being queried. If you need perfect row level data immediately, it may be that your OLTP source would be the better database to query.

If you need strict correctness under out‑of‑order ingestion or in replicated clusters with frequent updates/deletes, VersionedCollapsingMergeTree is the safest collapsing choice, but it introduces operational complexity.

Conclusion

In OLAP databases (specifically, ClickHouse with the table engines analysed above), every transaction is just writing a new row. The way the transactions are dealt with on merge depends on the engine, and you can do some operator math to ensure correctness before a merge.

But all rows are still writes. You can get around this, but you pay the cost of that in complexity at write time or query time.

In tomorrow’s article, we’ll show you how to navigate these complexities in the CDC from OLTP use-case.