ClickHouse Tutorial for Time Series Data: Building High-Performance Analytics

When I first started dealing with high-velocity telemetry data, I tried using a traditional relational database. It worked—until I hit 100 million rows. Suddenly, simple aggregations that should have taken milliseconds took minutes. That’s when I moved to ClickHouse. If you are looking for a clickhouse tutorial for time series data, you’ve come to the right place.

ClickHouse isn’t just another database; it’s a columnar store designed specifically for OLAP (Online Analytical Processing). For time series data—where you have a timestamp, a few dimensions, and many metrics—this architecture is a game-changer. While you might be considering the best database for real-time analytics, ClickHouse usually wins on raw read speed for massive datasets.

Prerequisites

Basic knowledge of SQL.
A running instance of ClickHouse (Docker is the fastest way to start).
A dataset or a stream of events (JSON or CSV).

Step 1: Designing the Schema for Time Series

In ClickHouse, the way you define your table determines your performance. The most critical part is the MergeTree engine family. For time series, we almost always use MergeTree or its specialized variants.

Here is how I set up a typical metrics table for server monitoring:

CREATE TABLE server_metrics (
    timestamp DateTime64(3, 'UTC'),
    metric_name LowCardinality(String),
    host_id UInt32,
    value Float64
) ENGINE = MergeTree()
ORDER BY (metric_name, host_id, timestamp);

Why this design?

LowCardinality(String): I use this for metric_name because there are only a few dozen unique metric names but millions of rows. This saves massive amounts of disk space.
ORDER BY: This is the primary key. By putting metric_name and host_id first, ClickHouse can skip entire granules of data when you filter by a specific metric or host.

Step 2: Efficient Data Ingestion

One mistake I see often is inserting data row-by-row. Never do this in ClickHouse. It creates too many small parts on disk, leading to “Too many parts” errors.

Instead, use batch inserts. If you are using Python, use the clickhouse-connect library to send chunks of 10k-100k rows. If you have a high-volume stream, I recommend using the Kafka engine. For those exploring local alternatives, a DuckDB vs MotherDuck comparison might show you how to handle smaller local files before pushing them to a production ClickHouse cluster.

Step 3: Querying Time Series Data

The power of ClickHouse is in its specialized functions. For time series, you’ll spend most of your time with toStartOfHour(), toStartOfMinute(), and the avg/max aggregations.

Here is a query to get the average CPU usage per hour for the last 24 hours:

SELECT 
    toStartOfHour(timestamp) AS hour,
    avg(value) AS avg_cpu
FROM server_metrics
WHERE metric_name = 'cpu_usage'
  AND timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour
ORDER BY hour;

As shown in the visualization below, the way ClickHouse processes this is by reading only the timestamp and value columns, ignoring everything else, which is why it’s so fast.

ClickHouse query execution plan showing columnar read efficiency

Pro Tips for Production

Use TTLs: Time series data grows indefinitely. Use TTL timestamp + INTERVAL 30 DAY to automatically delete old data.
Materialized Views: If you have a dashboard that refreshes every second, don’t query the raw table. Create a SummingMergeTree materialized view to pre-aggregate data.
Compression: Try the ZSTD(1) codec for columns with high redundancy to save more space.

Troubleshooting Common Issues

Issue: “Too many parts” error
This happens when you insert data too frequently in small batches. Solution: Increase your batch size or use a buffer table.

Issue: Slow queries on large tables
Check if your WHERE clause matches your ORDER BY key. If you filter by timestamp but it’s the last element in your ORDER BY, ClickHouse has to scan more data than necessary.

What’s Next?

Now that you’ve mastered the basics of this clickhouse tutorial for time series data, I recommend looking into ClickHouse Keeper for clustering and high availability. You can also explore integrating Grafana to visualize your ClickHouse data in real-time.