In the OLTP world, NULL is virtually costless — and often helpful. It preserves semantic nuance in your data: NULL doesn’t mean FALSE or 0; it just means unknown or not yet provided.

Because OLTP systems typically read or mutate individual rows rather than scan entire tables, the overhead of storing or checking for NULLs is negligible in most workloads. Even if there’s a small cost to skipping or branching on nulls, it’s triggered so rarely (per-row lookups, point updates) that it’s inconsequential.

That’s why most OLTP databases default columns to nullable unless you explicitly define them as NOT NULL.

This is generally great, it is super flexible, and it preserves the semantic meaning of NULL.

Uno reverso: OLAP NULLs

As I learned when working with ClickHouse and MooseStack, this is absolutely not the case whatsoever with OLAP.

OLAP columns assume they don't have NULL in them. They assume this so hard that you have to explicitly tell them that you might have a nullable(Type). And even then, my gut tells me that they don’t like it. You can’t have nullable nested fields. You can’t have nullable keys or order by fields (with most columns), and you need to read an entire column to be able to find nulls within it.

NULLs also hurt the performance of queries significantly.

Everything in the user experience is trying to kindly shove you away from using them.

Why OLAP and NULL aren’t a happy match

See CH docs: https://clickhouse.com/docs/sql-reference/data-types/nullable

Flip the context (from row-based updates to columnar scans) and NULL stops being a free lunch.

The intuitive, high level explanation of why NULLs aren’t well loved in the OLAP world, is that, as discussed in all the previous OLAP on Tap articles, OLAP is generally columnar, and requires reading huge amounts of data from a column efficiently to thrive.

If you have nulls in a column, the data in that column is not all of the same type. You have your typed data, and you have your place-holding sentinel, representing the NULL. Compression breaks. SIMD operations break. You get a bunch of inefficiency.

Under the hood (and I have only a very high level understanding of this, so please correct me if I oversimplified):

ClickHouse stores all data for a column together on disk
Except for NULLs, which it stores differently: it stores a separate map of where each null is in the column, such that other data in the column are all compliant with the type constraints of that column

This means, when the database engine is processing a column with nulls, it isn’t batching vectors and doing SIMD operations on the data:

It is looking at each datapoint in the column
Seeing if that datapoint is a “normal” one, or a null, and branching into one of two operations
The calculation over the column, or the appropriate way to skip that null in the calculation

What you can do instead: default

See CH docs: https://clickhouse.com/docs/sql-reference/statements/create/table#default_values

If the compute and performance cost of adding nullability to your column are too great, or if you need to use the column as a key, or an order by field, you may want to define a default value on write, rather than allow for NULLs.

Trade off: be careful about your default value, and consider the way your data is going to be queried. If you are averaging a column full of numbers, having 20% of them be NULL and ignored by the calculation is very very different to having 20% of them be 0 and ruin your average.

Exception

If your column is mainly NULL, some of the sources I read say 95%+; having a nullable column can actually be quite performant, since that null bitmap will be cheaper than a bunch of 0 data in the column.

TL;DR

Like pretty much everything we learned, you sacrifice write time freedom for read time performance (and in this case, storage efficiency too).

In OLAP, NULL isn’t free — it’s a tab you’ll pay at query time.

Heuristics

Scenario	Recommended Strategy	Rationale / Trade-offs
🟢 Column is always present	`NOT NULL` (default)	Simplest & fastest — no null bitmap, full SIMD & compression
🟢 Occasional blanks, small % (<5%)	Use `DEFAULT` value	Avoids null map overhead; document chosen sentinel (e.g. 0, 'UNKNOWN')
🟡 Semantically meaningful “unknown”	`Nullable(Type)` + guard in query	Preserve semantics, accept slower scans; avoid as key/ORDER BY
🟡 Needs to be key or in ORDER BY	Non-nullable + `DEFAULT`	Keys can’t be `NULL`; encode missing as sentinel
🔴 High frequency of nulls (5–95%)	❌ Avoid. Does this need to be in your OLAP table? Is there a suitable `DEFAULT`?	You’ll pay the branching + null map cost on every vector
🟢 Mostly nulls (95%+)	✅ `Nullable(Type)`	Null bitmap cheaper than storing many zeroes; scans skip efficiently
🟡 Nested or Array fields	❌ Disallow nulls inside	ClickHouse `Nested()` & arrays can’t hold `NULL` — use empty defaults
🟢 JSON or Object('json') columns	Keep path typed, use defaults or `assumeNotNull()`	Subcolumns infer types; avoid per-row null parsing

Agent Docs

🐙 https://github.com/oatsandsugar/olap-agent-ref/blob/main/4-managing-NULL

OLAP on Tap: Much Ado About Nulling

Uno reverso: OLAP NULLs

Why OLAP and NULL aren’t a happy match

What you can do instead: default

Exception

TL;DR

Heuristics

Agent Docs

Interested in learning more?

Related posts

OLAP migration complexity is the cost of fast reads

Ship your data with Moose APIs