How to Deploy a ClickHouse Cluster: Complete Setup Guide

ClickHouse is a column-oriented database purpose-built for analytical queries. It can scan billions of rows per second on commodity hardware, and it compresses data aggressively — often 5-10x better than row-oriented databases. That makes it the go-to choice for analytics workloads: event tracking, time-series data, logs, and user behavior analysis.

Deploying ClickHouse in a cluster — rather than a single node — gives you replication for fault tolerance and optional sharding for horizontal scale. This guide walks through how to plan, size, and deploy a production ClickHouse cluster from scratch.

What ClickHouse Is Good For

Before going through setup, it helps to understand what ClickHouse is optimized for:

Analytical queries over large datasets — aggregations, GROUP BY, time-range scans across hundreds of millions of rows
High-throughput ingestion — millions of rows per second via batch inserts
Efficient storage — columnar storage with LZ4 or ZSTD compression dramatically reduces disk usage compared to Postgres or MySQL

It is not a good fit for OLTP workloads — frequent single-row updates, complex transactions, or heavy point lookups. If you need those, keep Postgres for transactional data and use ClickHouse as a read replica or analytics store.

Why Self-Host ClickHouse

The main alternative to running your own cluster is ClickHouse Cloud, which is Altinity Cloud, or hosted offerings on GCP/AWS. Here is a realistic cost comparison:

Setup	Specs	Monthly Cost
ClickHouse Cloud (Development)	2 replicas, shared resources	~$100
ClickHouse Cloud (Production)	3 replicas, 30 vCPU, 120GB RAM	~$2,000-4,000
Self-hosted on Hetzner (3 nodes)	3x AX52: 12-core Ryzen, 64GB RAM, 512GB NVMe	~$135
Self-hosted on Hetzner (6 nodes)	6x AX52: same specs per node	~$270
Self-hosted on AWS (3 nodes)	3x r6i.2xlarge: 8 vCPU, 64GB RAM, EBS	~$900

For teams generating 50-500GB of analytical data per month, self-hosting on bare metal is 10-30x cheaper than ClickHouse Cloud at production tier pricing. The trade-off is that you own operations — backups, upgrades, monitoring, and incident response.

Self-hosting makes the most sense when:

You have engineering capacity to manage it (or use automation like sshploy)
Your data involves sensitive user information and you want it on your own infrastructure
Monthly cloud costs are a meaningful line item
You need custom configurations (custom merge tree settings, specific disk layouts, etc.)

Cluster Architecture: Shards and Replicas

Before deploying, you need to understand two concepts: shards and replicas. They are different dimensions of scaling.

Replicas

A replica is a copy of a shard. Each shard can have 1 or more replicas. Replicas serve two purposes:

Fault tolerance — if one node goes down, the other replica still holds the data
Read scaling — queries can be distributed across replicas

Replication in ClickHouse is table-level and uses the ReplicatedMergeTree engine family. Replicas synchronize data via ClickHouse Keeper (more on this below) and direct data transfers between nodes.

Shards

A shard is a horizontal partition of your data. With 2 shards, each shard holds roughly half your data. Queries against a distributed table fan out to all shards and merge the results.

Sharding is appropriate when:

A single node cannot hold all your data
Query parallelism across shards meaningfully improves performance
You are ingesting at rates a single node cannot sustain

For most teams starting out, a 1-shard, 2-replica cluster is the right starting point. You get fault tolerance without the operational complexity of managing distributed queries across shards. Add shards later when you actually need them.

ClickHouse Keeper

ClickHouse Keeper is a distributed coordination service that manages replica metadata and replication queues. It is a purpose-built replacement for ZooKeeper and is now the recommended approach. Keeper uses the Raft consensus algorithm and ships as part of ClickHouse itself — no separate binaries to manage.

A Keeper ensemble requires an odd number of nodes (3 or 5) to maintain quorum. In a 3-node ClickHouse cluster, you typically run one Keeper instance on each ClickHouse node. Keeper is lightweight and has minimal impact on node resources.

Reference Architecture: 3-Node Cluster

A practical minimum production setup:

Node 1: ClickHouse (Shard 1, Replica 1) + Keeper (voter)
Node 2: ClickHouse (Shard 1, Replica 2) + Keeper (voter)
Node 3: ClickHouse (Shard 1, Replica 3) + Keeper (voter)

This gives you:

Full data replication across 3 nodes — the cluster survives one node failure
Keeper quorum across 3 nodes — coordination survives one node failure
Read queries distributed across 3 replicas

To add sharding later, you add more nodes and rebalance.

Hardware Sizing

ClickHouse is disk-heavy and memory-hungry. Here are practical sizing guidelines:

CPU

ClickHouse benefits from high core counts because it parallelizes query execution within a single node. A 12-16 core machine is a reasonable minimum for production. Clock speed matters less than parallelism.

Memory

The general rule is 1-2GB of RAM per billion rows you expect to query frequently. For a typical analytics setup with 10-50 billion rows, 64-128GB RAM per node covers most workloads. If you are running complex aggregations over very wide tables, you may need more.

Storage

Use NVMe SSDs. ClickHouse's performance advantage over other databases is largely defeated by slow disk I/O. For a starting point, estimate your raw data size and multiply by 0.2-0.4 — ClickHouse compression typically brings data down to 20-40% of the uncompressed size. Add 2-3x headroom for merge operations, which temporarily double disk usage.

A Hetzner AX52 (AMD Ryzen 9 7900X, 12 cores, 64GB RAM, 2x512GB NVMe in RAID-0) at about $45/month is a solid production node for most use cases. Three of these gives you a well-specced 3-node cluster for roughly $135/month.

Network

Replication transfers full data parts between nodes. On inserts, data is written to one replica and then synchronized to others. For high-ingest workloads, your inter-node bandwidth is a bottleneck — 1Gbps is acceptable, 10Gbps is better. Co-locate your cluster nodes in the same datacenter.

Step-by-Step Setup Overview

1. Provision Servers

Provision 3 servers with the same OS (Ubuntu 22.04 LTS is the recommended base) in the same datacenter. Assign them static IPs or stable hostnames. Configure firewall rules:

Port 9000 (native TCP protocol) — between ClickHouse nodes and clients
Port 8123 (HTTP interface) — for clients using HTTP
Port 9009 (inter-server replication) — between ClickHouse nodes
Port 9181 (Keeper client port) — for ClickHouse to communicate with Keeper
Port 9234 (Keeper Raft port) — between Keeper nodes

2. Install ClickHouse

Install from the official ClickHouse repository. On Ubuntu:

apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key \
  | gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] \
  https://packages.clickhouse.com/deb stable main" \
  | tee /etc/apt/sources.list.d/clickhouse.list
apt-get update
apt-get install -y clickhouse-server clickhouse-client

Install the same version on all nodes. Mixing versions in a cluster causes replication failures.

3. Configure ClickHouse Keeper

Each node runs a Keeper instance. Add a keeper_config.xml (or include it in config.xml) that defines the Keeper server ID and the full list of Keeper nodes:

<keeper_server>
    <tcp_port>9181</tcp_port>
    <server_id>1</server_id>  <!-- unique per node: 1, 2, 3 -->
    <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
    <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

    <coordination_settings>
        <operation_timeout_ms>10000</operation_timeout_ms>
        <raft_logs_level>warning</raft_logs_level>
    </coordination_settings>

    <raft_configuration>
        <server>
            <id>1</id>
            <hostname>ch-node-1</hostname>
            <port>9234</port>
        </server>
        <server>
            <id>2</id>
            <hostname>ch-node-2</hostname>
            <port>9234</port>
        </server>
        <server>
            <id>3</id>
            <hostname>ch-node-3</hostname>
            <port>9234</port>
        </server>
    </raft_configuration>
</keeper_server>

4. Configure the Cluster

Define the cluster topology in config.xml on all nodes. For a 1-shard, 3-replica cluster:

<remote_servers>
    <my_cluster>
        <shard>
            <replica>
                <host>ch-node-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>ch-node-2</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>ch-node-3</host>
                <port>9000</port>
            </replica>
        </shard>
    </my_cluster>
</remote_servers>

<zookeeper>
    <node>
        <host>ch-node-1</host>
        <port>9181</port>
    </node>
    <node>
        <host>ch-node-2</host>
        <port>9181</port>
    </node>
    <node>
        <host>ch-node-3</host>
        <port>9181</port>
    </node>
</zookeeper>

<macros>
    <shard>01</shard>
    <replica>ch-node-1</replica>  <!-- unique per node -->
</macros>

5. Create Replicated Tables

With Keeper running and cluster config in place, create tables using the ReplicatedMergeTree engine. The {shard} and {replica} macros are substituted from your config:

CREATE TABLE events ON CLUSTER my_cluster
(
    timestamp   DateTime,
    user_id     UInt64,
    event_type  LowCardinality(String),
    properties  String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, timestamp);

Running ON CLUSTER creates the table on all nodes simultaneously.

6. Set Up CHProxy

CHProxy is an HTTP proxy for ClickHouse that handles load balancing across replicas, connection pooling, query timeouts, and per-user rate limiting. It is the standard way to expose a single endpoint to your application.

# chproxy config
server:
  http:
    listen_addr: ":9090"

users:
  - name: "app_user"
    password: "secret"
    to_cluster: "my_cluster"
    to_user: "default"

clusters:
  - name: "my_cluster"
    nodes:
      - "ch-node-1:8123"
      - "ch-node-2:8123"
      - "ch-node-3:8123"
    users:
      - name: "default"
        password: ""

Your application connects to CHProxy on port 9090. CHProxy round-robins requests across healthy replicas and removes failed nodes from rotation automatically.

Configuration Essentials

Memory Limits

By default, ClickHouse will use as much memory as queries need. Set limits to prevent runaway queries from killing the node:

<profiles>
    <default>
        <max_memory_usage>32000000000</max_memory_usage>  <!-- 32GB per query -->
        <max_memory_usage_for_all_queries>50000000000</max_memory_usage_for_all_queries>
    </default>
</profiles>

Merge Tree Settings

For high-ingest workloads, tuning max_insert_block_size and parts_to_delay_insert prevents the "too many parts" error that occurs when inserts outpace merges:

<merge_tree>
    <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
    <parts_to_delay_insert>150</parts_to_delay_insert>
    <parts_to_throw_insert>300</parts_to_throw_insert>
</merge_tree>

Compression

Use ZSTD for better compression ratios at the cost of slightly higher CPU:

CREATE TABLE events (...)
ENGINE = ReplicatedMergeTree(...)
...
SETTINGS min_compress_block_size = 65536,
         max_compress_block_size = 1048576,
         compression = 'zstd';

Backups

Use clickhouse-backup (maintained by Altinity) for automated backups. It supports both local and remote destinations (S3, GCS, Azure Blob):

# Install
curl -L https://github.com/Altinity/clickhouse-backup/releases/latest/download/\
clickhouse-backup-linux-amd64 -o /usr/local/bin/clickhouse-backup
chmod +x /usr/local/bin/clickhouse-backup

# Create a backup
clickhouse-backup create my-backup-2026-02-17

# Upload to S3
clickhouse-backup upload my-backup-2026-02-17

Configure a cron job for daily backups with retention:

# /etc/clickhouse-backup/config.yml
general:
  remote_storage: s3
s3:
  bucket: your-backup-bucket
  path: clickhouse-backups/
  region: eu-central-1
  access_key: ...
  secret_key: ...
clickhouse:
  backup_mutations: true

A reasonable backup schedule: hourly incremental backups, daily full backups, retain 7 days of daily backups and 30 days of weeklies.

Monitoring

ClickHouse exposes Prometheus metrics at /metrics on the HTTP port. Scrape them with a standard Prometheus setup and use the official Grafana dashboard (ID: 14192) for a baseline.

Key metrics to alert on:

ClickHouseMetrics_ReplicasMaxAbsoluteDelay — replication lag. Alert if above 300 seconds.
ClickHouseMetrics_MemoryTracking — memory in use. Alert if above 80% of node RAM.
ClickHouseProfileEvents_MergedRows — merge activity. Sudden spikes indicate insert pressure.
ClickHouseAsyncMetrics_NumberOfDetachedParts — detached parts indicate data integrity issues. Should be 0.
Keeper metrics: KeeperOutstandingRequests — should stay near 0 under normal operation.

How sshploy Helps

The setup described above involves roughly 500 lines of configuration spread across 3 nodes, in addition to CHProxy setup, Keeper coordination, Ansible or manual provisioning, and monitoring configuration. Getting it right requires understanding ClickHouse internals, and debugging cluster issues — particularly replication lag or Keeper quorum problems — can take hours.

sshploy automates the entire process:

Cluster topology selection — choose your shard count, replica count, and node assignments through a UI
Automated provisioning — sshploy runs Ansible playbooks that install ClickHouse, configure Keeper, set up inter-node networking, and apply production-safe defaults
CHProxy deployment — a properly configured CHProxy is deployed alongside the cluster with load balancing and health checks
Backup configuration — clickhouse-backup is configured and scheduled against your S3-compatible storage
Monitoring stack — Prometheus scraping and Grafana dashboards are set up out of the box, including the replication lag alerts described above
Rollback support — if something goes wrong, sshploy can restore a previous configuration

You connect sshploy to your servers via SSH keys, configure your cluster parameters, and it handles the rest. The full cluster is deployed in under 10 minutes, compared to several hours of manual configuration work.

FAQ

How many nodes do I need to start?

Three nodes is the practical minimum for a production ClickHouse cluster. This gives you 1 shard with 3 replicas and a 3-node Keeper ensemble — the cluster can survive one node failure. Two nodes is insufficient because Keeper requires a majority (quorum) to operate, and with 2 nodes, losing one node causes the whole cluster to stop accepting writes.

Can I run ClickHouse Keeper on separate servers?

Yes. In large deployments, it makes sense to run a dedicated 3-node or 5-node Keeper ensemble separate from your ClickHouse nodes. This reduces resource contention and allows you to scale Keeper independently. For most teams, co-locating Keeper on ClickHouse nodes (one Keeper per ClickHouse node) is simpler and works well.

What is the difference between ReplicatedMergeTree and Distributed tables?

ReplicatedMergeTree handles replication — it keeps data synchronized across replicas using Keeper as a coordination layer. A Distributed table is a virtual table that routes queries and inserts across shards. You typically use both: ReplicatedMergeTree for the underlying storage on each shard, and a Distributed table as the query endpoint that fans out across all shards. When you run ON CLUSTER, ClickHouse creates both automatically.

How do I handle schema migrations in a running cluster?

Use ALTER TABLE ... ON CLUSTER cluster_name for schema changes. ClickHouse applies DDL changes asynchronously across nodes via the distributed DDL queue. Most ALTER TABLE operations (adding columns, modifying TTL, adding indexes) are non-blocking in ClickHouse — they do not lock the table during the change. Dropping columns and changing column types are more involved and require care with replication.

Is ClickHouse suitable for replacing Elasticsearch for log storage?

Often, yes. ClickHouse handles unstructured log data well when you store logs in a structured format (timestamp, log level, service, message, JSON properties). Query performance for log search over large datasets is typically much better than Elasticsearch, and storage costs are lower due to compression. The main trade-off is that ClickHouse does not have full-text search with relevance scoring — if you need fuzzy search or relevance ranking, Elasticsearch still wins. For structured log filtering and time-range queries, ClickHouse is usually superior.