How to Back Up ClickHouse: Complete Backup Guide

ClickHouse processes billions of rows per second and compresses data aggressively, which makes it excellent for analytics. It also means losing that data is catastrophic. Rebuilding an analytics dataset from source systems -- event pipelines, CDC streams, log aggregators -- can take days or weeks, assuming the source data even still exists. A solid backup strategy is not optional.

This guide covers how to back up ClickHouse in production: the tooling, storage targets, scheduling, retention, restore procedures, and the mistakes that catch people off guard.

Why Backups Matter for Analytics Databases

Analytics databases have different backup requirements than transactional databases. With PostgreSQL, you worry about losing the current state of business data. With ClickHouse, the concern is volume. A typical ClickHouse deployment holds tens of terabytes of compressed data. The raw, uncompressed source data is often 5-10x larger and may no longer be available in the original pipeline.

Specific risks that backups protect against:

Accidental DROP TABLE or TRUNCATE -- ClickHouse does not have a recycle bin. These operations are immediate and irreversible.
Bad migrations -- an ALTER TABLE that drops the wrong column or changes a codec incorrectly can corrupt or lose data across all replicas simultaneously.
Disk failure -- even with replication, if a bug or misconfiguration affects all replicas, replication propagates the damage rather than protecting against it.
Cluster-wide failures -- ClickHouse Keeper quorum loss, network partitions, or botched upgrades can leave a cluster in an unrecoverable state.

Replication is not a backup. Replicas protect against hardware failure on a single node. They do not protect against logical errors, accidental deletes, or corruption that propagates through the replication layer.

Backup Strategies: Full vs. Incremental

Full Backups

A full backup captures every data part for every table. It is the simplest approach and produces a self-contained snapshot you can restore independently. The downside is size -- a ClickHouse node holding 2TB of compressed data produces a 2TB backup, and uploading that to remote storage daily gets expensive in both time and bandwidth.

Full backups are appropriate as a weekly or bi-weekly baseline.

Incremental Backups

An incremental backup captures only the data parts that changed since the last backup. ClickHouse stores data in immutable parts that are periodically merged. Incremental backups leverage this by tracking which parts are new. A daily incremental on a node ingesting 50GB/day might produce a 50-60GB backup instead of the full 2TB.

The trade-off is that restoring from an incremental backup requires the base full backup plus all subsequent incrementals in order. If any link in the chain is corrupted or missing, the restore fails.

A practical strategy: weekly full backups with daily incrementals.

The clickhouse-backup Tool

clickhouse-backup is an open-source tool maintained by Altinity. It is the standard approach for backing up ClickHouse and supports local snapshots, remote uploads to S3/GCS/Azure Blob, incremental backups, and table-level filtering.

Installation

Download the latest release binary:

LATEST=$(curl -s https://api.github.com/repos/Altinity/clickhouse-backup/releases/latest \
  | grep tag_name | cut -d '"' -f 4)
curl -L "https://github.com/Altinity/clickhouse-backup/releases/download/${LATEST}/clickhouse-backup-linux-amd64.tar.gz" \
  -o /tmp/clickhouse-backup.tar.gz
tar -xzf /tmp/clickhouse-backup.tar.gz -C /usr/local/bin/
chmod +x /usr/local/bin/clickhouse-backup

Verify the installation:

clickhouse-backup --version

Configuration

Create the configuration file at /etc/clickhouse-backup/config.yml:

general:
  remote_storage: s3
  max_file_size: 1073741824  # 1GB multipart upload chunk
  backups_to_keep_local: 3
  backups_to_keep_remote: 0  # managed separately via retention
  log_level: info
  allow_empty_backups: false

clickhouse:
  host: localhost
  port: 9000
  username: backup_user
  password: "your-secure-password"
  timeout: 5m
  freeze_by_part: false
  freeze_by_part_where: ""
  backup_mutations: true
  skip_tables:
    - "system.*"
    - "INFORMATION_SCHEMA.*"
    - "information_schema.*"
  skip_table_engines:
    - "Memory"
    - "Log"
    - "TinyLog"

Create a dedicated ClickHouse user for backups with the minimum required permissions:

CREATE USER backup_user
  IDENTIFIED BY 'your-secure-password'
  SETTINGS max_execution_time = 600;

GRANT SELECT, SHOW TABLES, SYSTEM FREEZE, SYSTEM UNFREEZE
  ON *.* TO backup_user;

Creating a Local Backup

# Full backup
clickhouse-backup create "full-$(date +%Y-%m-%d)"

# List local backups
clickhouse-backup list local

This creates a hard-link snapshot of ClickHouse data parts under /var/lib/clickhouse/backup/. Hard links mean the local backup consumes almost no additional disk space until ClickHouse merges or removes the original parts.

Creating an Incremental Backup

# First, create a full backup as the base
clickhouse-backup create "full-2026-02-17"

# Later, create an incremental based on the full
clickhouse-backup create --diff-from-remote="full-2026-02-17" "incr-2026-02-18"

The --diff-from-remote flag tells clickhouse-backup to only include parts that are not already present in the referenced backup. The resulting incremental backup is significantly smaller.

Configuring S3 Storage

Remote storage is essential. Local backups on the same disk as the database are useless if the disk fails. Add the S3 section to your configuration:

s3:
  access_key: "AKIAIOSFODNN7EXAMPLE"
  secret_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
  bucket: "your-clickhouse-backups"
  path: "cluster-prod/node-1/"
  region: "eu-central-1"
  acl: "private"
  storage_class: "STANDARD_IA"
  compression_format: "tar"
  compression_level: 1
  sse: ""
  disable_ssl: false
  max_parts_count: 10000
  allow_multipart_download: true
  object_disk_path: ""

Key settings:

storage_class: STANDARD_IA -- Infrequent Access storage is roughly 40% cheaper than Standard for data you rarely read (which describes backups). Use GLACIER or DEEP_ARCHIVE for long-term retention if you can tolerate retrieval delays.
path -- include the cluster name and node identifier to keep backups organized.
compression_format: tar -- clickhouse-backup bundles parts into tar archives for upload. ClickHouse data is already compressed (LZ4 or ZSTD), so additional compression yields minimal benefit.

For S3-compatible storage (MinIO, Backblaze B2, Cloudflare R2), add the endpoint override:

s3:
  endpoint: "https://s3.us-west-002.backblazeb2.com"
  # ... rest of config

Upload and Download

# Upload a local backup to S3
clickhouse-backup upload "full-2026-02-17"

# Download a remote backup to local
clickhouse-backup download "full-2026-02-17"

# List remote backups
clickhouse-backup list remote

Scheduling Backups with Cron

Automate backups with a cron job. This script runs a daily incremental backup, a weekly full backup on Sundays, and uploads each to S3:

#!/bin/bash
# /usr/local/bin/clickhouse-backup-cron.sh
set -euo pipefail

BACKUP_DATE=$(date +%Y-%m-%d-%H%M)
DAY_OF_WEEK=$(date +%u)
LOG="/var/log/clickhouse-backup/backup-${BACKUP_DATE}.log"

mkdir -p /var/log/clickhouse-backup

if [ "$DAY_OF_WEEK" -eq 7 ]; then
    BACKUP_NAME="full-${BACKUP_DATE}"
    echo "Creating full backup: ${BACKUP_NAME}" >> "$LOG"
    clickhouse-backup create "${BACKUP_NAME}" >> "$LOG" 2>&1
    clickhouse-backup upload "${BACKUP_NAME}" >> "$LOG" 2>&1
else
    LAST_FULL=$(clickhouse-backup list remote | grep "^full-" | tail -1 | awk '{print $1}')
    if [ -z "$LAST_FULL" ]; then
        echo "No full backup found, creating one" >> "$LOG"
        BACKUP_NAME="full-${BACKUP_DATE}"
        clickhouse-backup create "${BACKUP_NAME}" >> "$LOG" 2>&1
    else
        BACKUP_NAME="incr-${BACKUP_DATE}"
        echo "Creating incremental backup: ${BACKUP_NAME} (diff from ${LAST_FULL})" >> "$LOG"
        clickhouse-backup create --diff-from-remote="${LAST_FULL}" "${BACKUP_NAME}" >> "$LOG" 2>&1
    fi
    clickhouse-backup upload "${BACKUP_NAME}" >> "$LOG" 2>&1
fi

# Clean up old local backups (keep last 3)
clickhouse-backup delete local --keep-last 3 >> "$LOG" 2>&1

echo "Backup completed: ${BACKUP_NAME}" >> "$LOG"

Add to crontab:

# Run backup daily at 03:00 UTC
0 3 * * * /usr/local/bin/clickhouse-backup-cron.sh

Make the script executable and ensure the clickhouse-backup binary is in the cron user's PATH:

chmod +x /usr/local/bin/clickhouse-backup-cron.sh

Retention Policies

Without retention management, backup storage grows indefinitely. A balanced retention policy keeps enough history to recover from delayed-discovery problems without accumulating unnecessary cost.

A practical retention policy:

Backup Type	Frequency	Retention
Incremental	Daily	7 days
Full	Weekly (Sunday)	30 days
Full (archive)	Monthly (1st Sunday)	6 months

Implement retention cleanup in your cron script or as a separate job:

#!/bin/bash
# /usr/local/bin/clickhouse-backup-cleanup.sh
set -euo pipefail

# Remove remote incrementals older than 7 days
clickhouse-backup delete remote --keep-last-remote 7 --name-filter="incr-*"

# Remove remote fulls older than 30 days (except monthly archives)
clickhouse-backup list remote \
  | grep "^full-" \
  | head -n -4 \
  | awk '{print $1}' \
  | while read -r name; do
      clickhouse-backup delete remote "$name"
    done

Also configure S3 lifecycle policies as a safety net. Set objects under the backup prefix to transition to Glacier after 60 days and expire after 180 days. This catches anything the cleanup script misses.

Restore Procedures

Restoring from a backup involves downloading from remote storage, restoring to the local ClickHouse data directory, and verifying the data.

Full Restore

# Stop ClickHouse (optional but recommended for full restores)
systemctl stop clickhouse-server

# Download the backup from S3
clickhouse-backup download "full-2026-02-17"

# Restore all tables
clickhouse-backup restore "full-2026-02-17"

# Start ClickHouse
systemctl start clickhouse-server

# Verify row counts
clickhouse-client --query "SELECT database, table, sum(rows) FROM system.parts WHERE active GROUP BY database, table ORDER BY database, table"

Restoring Specific Tables

If only one table was lost or corrupted, restore it selectively:

clickhouse-backup download "full-2026-02-17"
clickhouse-backup restore --tables="analytics.events" "full-2026-02-17"

Restoring from an Incremental Chain

Incremental restores must be applied in order -- base full backup first, then each incremental in sequence:

# Download and restore the base
clickhouse-backup download "full-2026-02-16"
clickhouse-backup restore "full-2026-02-16"

# Download and restore the incremental
clickhouse-backup download "incr-2026-02-17"
clickhouse-backup restore --diff-from-remote="full-2026-02-16" "incr-2026-02-17"

Restoring to a Different Cluster

To migrate or restore data to a different cluster, download the backup on the target node and restore with the --rm flag to clean up existing data:

# On the target node
clickhouse-backup download "full-2026-02-17"
clickhouse-backup restore --rm "full-2026-02-17"

If the target cluster has a different shard/replica configuration, you may need to adjust the ReplicatedMergeTree paths in the table schemas after restore.

Disaster Recovery Testing

A backup you have never tested restoring is not a backup -- it is a hope. Schedule quarterly disaster recovery drills.

DR Testing Checklist

Provision a staging node -- spin up a single ClickHouse instance that is not connected to your production cluster.
Download the latest full backup from S3 to the staging node.
Restore the backup using the procedures above.
Verify data integrity -- compare row counts, run known queries and compare results against production, check that partitions are complete.
Measure restore time -- document how long the download and restore took. This is your Recovery Time Objective (RTO) baseline.
Test incremental restore -- download and apply an incremental on top of the full to verify the chain works.
Tear down the staging node when done.

Automate this with a script that runs monthly, posts results to Slack or email, and flags failures:

#!/bin/bash
# dr-test.sh -- run on a disposable staging server
set -euo pipefail

LATEST_FULL=$(clickhouse-backup list remote | grep "^full-" | tail -1 | awk '{print $1}')
START_TIME=$(date +%s)

clickhouse-backup download "$LATEST_FULL"
clickhouse-backup restore "$LATEST_FULL"

systemctl start clickhouse-server
sleep 10

ROW_COUNT=$(clickhouse-client --query "SELECT sum(rows) FROM system.parts WHERE active")
END_TIME=$(date +%s)
DURATION=$(( END_TIME - START_TIME ))

echo "DR Test Complete. Backup: ${LATEST_FULL}, Rows: ${ROW_COUNT}, Duration: ${DURATION}s"

Common Backup Mistakes

1. Relying on replication instead of backups. Replication protects against hardware failure. It does not protect against DROP TABLE, bad migrations, or application bugs that write corrupt data. All replicas reflect the same logical state -- if that state is broken, all replicas are broken.

2. Not testing restores. The most common backup failure mode is discovering during a real outage that backups are incomplete, corrupted, or the restore process has a dependency you did not account for. Test restores quarterly at minimum.

3. Storing backups on the same disk or in the same datacenter. A datacenter-level failure takes out both your database and your backups. Use a remote S3 bucket in a different region or provider.

4. No retention policy. Backup storage costs grow linearly. Without cleanup, a 2TB cluster producing daily backups accumulates 60TB of backup data per month. Set retention policies from day one.

5. Ignoring backup monitoring. A cron job that silently fails for three weeks means your most recent backup is three weeks old. Monitor backup job exit codes, track backup sizes over time (sudden drops indicate problems), and alert on missing backups.

6. Backing up only one node in a sharded cluster. Each shard holds different data. If you have 3 shards, you need backups from one replica in each shard -- not just one node from one shard.

7. Not freezing tables during backup. clickhouse-backup handles this automatically via FREEZE, but custom backup scripts that copy data directories without freezing first may capture inconsistent state as ClickHouse merges parts in the background.

How sshploy Automates ClickHouse Backups

sshploy configures clickhouse-backup as part of its ClickHouse cluster deployment. When you deploy a cluster through sshploy, it installs clickhouse-backup on each node, generates the configuration file with your S3 credentials and bucket, sets up the cron schedule for weekly full and daily incremental backups, configures retention cleanup, and creates a dedicated backup user with the correct permissions. The entire backup pipeline is production-ready from the first deployment without any manual configuration. If you later need to adjust the schedule or retention policy, sshploy's recipe system lets you update the configuration and re-deploy it across all nodes in a single operation.

FAQ

How long does a ClickHouse backup take?

Backup speed depends on data volume and disk I/O. Creating a local backup (freeze + hard links) is nearly instant -- typically under 30 seconds regardless of data size. Uploading to S3 depends on your network bandwidth. On a 1Gbps link, expect roughly 100-120GB per hour. A 2TB full backup upload takes 16-20 hours. Incremental backups are much faster since they only upload changed parts.

Can I back up a ClickHouse cluster while it is receiving writes?

Yes. clickhouse-backup uses ClickHouse's native FREEZE command, which creates a consistent snapshot of data parts using hard links. The freeze is atomic and does not block inserts or queries. Writes that arrive after the freeze are not included in the backup, which is expected behavior -- they will be captured in the next backup.

Should I back up every replica or just one per shard?

One replica per shard is sufficient. All replicas within a shard hold identical data, so backing up multiple replicas of the same shard is redundant. In a 2-shard, 3-replica cluster (6 nodes), you need backups from 2 nodes -- one from each shard. Pick the replica with the least query load to minimize performance impact.

What happens if a backup is interrupted mid-upload?

clickhouse-backup uses S3 multipart uploads. If the upload is interrupted, the incomplete multipart upload remains in S3 until it times out or you abort it. The backup is not usable in its partial state. Re-run the upload command and it will restart. Configure an S3 lifecycle rule to abort incomplete multipart uploads after 7 days to avoid accumulating orphaned parts.

How do I back up ClickHouse to a non-S3 storage provider?

clickhouse-backup supports GCS (Google Cloud Storage), Azure Blob Storage, and any S3-compatible object store (MinIO, Backblaze B2, Cloudflare R2, Wasabi). For GCS, set remote_storage: gcs and provide your service account credentials. For Azure, set remote_storage: azblob and configure your storage account. For S3-compatible providers, use the standard s3 configuration with an endpoint override pointing to the provider's API endpoint.