ClickHouse · Stanley Jacob

ClickHouse is a columnar database built for online analytical processing (OLAP): aggregations and scans over billions of rows, the kind of query that powers dashboards and analytics. It is the opposite end of the spectrum from a row store like Postgres, and the difference is not incremental, on suitable analytical queries it can be 100 to 1000 times faster (ClickHouse docs).

The reason is columnar storage. Analytical queries touch a few columns of many rows, and storing each column contiguously means a query that reads 3 of 50 columns reads only those 3, skipping over 90 percent of the data; contiguous same-type values also compress far better and vectorize on the CPU. A row store, which interleaves all columns per row, must read everything.

How to use it

The workhorse engine is MergeTree. You give it an ORDER BY sort key, and it keeps data sorted and builds a sparse primary index, one entry per granule of about 8,192 rows, so the index stays tiny and lives in memory even for billions of rows.

CREATE TABLE events (
    ts       DateTime,
    user_id  UInt64,
    event    LowCardinality(String),     -- dictionary-encoded, very compressible
    value    Float64
) ENGINE = MergeTree
ORDER BY (event, ts);                     -- sort key -> sparse primary index

SELECT event, avg(value)
FROM events
WHERE ts >= now() - INTERVAL 1 DAY
GROUP BY event;                           -- reads only event, ts, value columns

Key mechanics, and why

The sparse index does not point at individual rows; it narrows the scan to the handful of granules that could match, and those are read and filtered in parallel. The why behind this choice is that OLAP queries return aggregates over ranges, not single rows, so a per-row index would be wasted memory; the sparse index trades pinpoint lookups (which ClickHouse is bad at) for cheap range scans (which it is built for). Background merges combine sorted parts, much like an LSM tree, keeping the layout efficient.

Trade-offs

ClickHouse is superb at reads-mostly analytics and weak at the things a transactional database is built for: single-row point lookups can be milliseconds instead of microseconds, updates and deletes are heavyweight, and there are no rich multi-row transactions. The common architecture is therefore both, Postgres for the transactional workload and ClickHouse fed from it for analytics.

Internals worth knowing

The MergeTree engine is a family, not one engine. Data is written as immutable parts that a background process continually merges, much like an LSM tree, and specialized variants merge with semantics: ReplacingMergeTree deduplicates by key, SummingMergeTree and AggregatingMergeTree pre-aggregate, and CollapsingMergeTree cancels rows. The primary index is sparse (one mark per roughly 8,192-row granule) and sits beside per-column mark files that map granules to compressed blocks, so a query reads only the columns and granules it needs.

Speed comes from vectorized execution, processing columns in cache-friendly batches that use SIMD, plus aggressive compression with codecs like LZ4 (fast) or ZSTD (smaller) and Delta / DoubleDelta for sorted numeric and time columns. Materialized views transform data into roll-up tables at insert time, projections store alternate sort orders, and skip indexes (min-max, set, bloom) prune granules further. Distribution uses ReplicatedMergeTree plus sharding, coordinated by ZooKeeper or the built-in Keeper.

Using it from Python

import clickhouse_connect
client = clickhouse_connect.get_client(host="localhost")

client.command("""
    CREATE TABLE IF NOT EXISTS events (
        ts DateTime, user_id UInt64,
        event LowCardinality(String), value Float64
    ) ENGINE = MergeTree ORDER BY (event, ts)""")

client.insert("events", rows, column_names=["ts", "user_id", "event", "value"])

res = client.query("SELECT event, avg(value) FROM events GROUP BY event")
print(res.result_rows)

Worked example

A dashboard query that scans a day of events and groups by type touches only three columns and only the matching time-range granules, which is why it returns in milliseconds over billions of rows (illustrative):

SELECT event, count(), avg(value)
FROM events
WHERE ts >= today()
GROUP BY event
ORDER BY count() DESC;

-- event    | count() | avg(value)
-- click    | 4821093 | 0.42
-- view     | 9120388 | 0.07
-- purchase |  118204 | 18.95

In production

ClickHouse has become a default for large-scale observability and analytics. Anthropic runs a self-hosted, air-gapped ClickHouse for its observability stack: when Claude usage took off, the observability team scaled telemetry, metrics, and logs on ClickHouse (ingested with Vector, orchestrated on Kubernetes), and an engineer noted it "played an instrumental role in helping us develop and ship Claude 4" (ClickHouse, 2025). Uber rebuilt its log-analytics platform on ClickHouse, streaming logs from Kafka and flattening them for interactive queries (Uber Engineering), and Cloudflare serves analytics over ClickHouse (Cloudflare). The common thread is enormous append-mostly event volume queried with aggregates.

Follow-up questions

Why is columnar faster for analytics? Queries read only the few columns they need, skipping the rest, and contiguous same-type data compresses and vectorizes well.
What is the sparse primary index? One index entry per granule (~8,192 rows); it narrows a query to candidate blocks rather than locating single rows.
Why is ClickHouse bad at point lookups and updates? Columnar layout and merge-based storage favor range scans; fetching or mutating one row touches many parts and columns.
How do ClickHouse and Postgres fit together? Postgres handles transactional writes; data flows into ClickHouse for fast analytical queries (HTAP via two stores).
Why LowCardinality columns? They dictionary-encode repeated strings, shrinking storage and speeding group-by on fields like event type or country.

References

ClickHouse Documentation.
Stonebraker et al., C-Store: A Column-oriented DBMS (2005).
Kleppmann, Designing Data-Intensive Applications (2017).
ClickHouse, How Anthropic Uses ClickHouse to Scale Observability for the AI Era (2025).
Uber Engineering, Fast and Reliable Schema-Agnostic Log Analytics Platform.
Cloudflare, Log Analytics Using ClickHouse.