PostgreSQL High Availability with Patroni: Step-by-Step Guide

Running PostgreSQL in production without high availability is a single point of failure. When the primary node goes down — hardware failure, kernel panic, network split — your database is unavailable until someone intervenes manually. For most production applications, that's not acceptable.

Patroni solves this by turning a standalone PostgreSQL instance into a fully automated HA cluster with leader election, automatic failover, and consistent configuration management. This guide walks through the architecture, how the components fit together, and what it takes to run a production Patroni cluster.

Why High Availability Matters for PostgreSQL

PostgreSQL itself does not include cluster management. It supports streaming replication out of the box, but managing which replica becomes the new primary after a failure, reconfiguring replicas to follow the new leader, and preventing split-brain scenarios — none of that is built in.

Without HA, a primary failure requires manual intervention:

Detect the failure (usually through monitoring, which takes time)
Promote a replica manually via pg_ctl promote
Update your application's connection string or load balancer
Reconfigure remaining replicas to follow the new primary
Deal with any replication divergence

This process takes anywhere from minutes to hours depending on your team's response time. Patroni reduces this to 10-30 seconds of automated failover with no human intervention required.

Architecture Overview

A production Patroni cluster involves four main components working together:

Patroni

Patroni is a Python-based cluster manager that runs on every PostgreSQL node. It handles:

Leader election via a distributed configuration store
Automatic failover when the primary becomes unreachable
Continuous health checks and cluster state management
REST API for manual operations (patronictl)
PostgreSQL configuration management via postgresql.conf

Patroni supports PostgreSQL 10 through 17. For new deployments, target PostgreSQL 15, 16, or 17. Each node runs both the Patroni process and a PostgreSQL instance.

etcd

etcd is a distributed key-value store used as Patroni's distributed lock and consensus system. Patroni uses etcd to:

Store the current cluster leader
Maintain a distributed lock to prevent split-brain
Broadcast cluster state to all nodes

etcd runs on a separate cluster (3 or 5 nodes for quorum). Some deployments co-locate etcd on the same servers as PostgreSQL, which reduces costs but couples failure domains. A dedicated 3-node etcd cluster is preferable for production.

Alternatives to etcd include Consul and ZooKeeper. etcd is the most common choice with Patroni, is simpler to operate, and integrates cleanly with Patroni's API.

HAProxy

HAProxy sits in front of your PostgreSQL nodes and routes connections to the correct backend:

Port 5000 — routes to the current primary (read/write)
Port 5001 — routes to replicas (read-only)

HAProxy performs health checks against Patroni's REST API (/primary and /replica endpoints) to determine which nodes to route to. When failover occurs and a new primary is elected, HAProxy detects the change within seconds through health check polling and updates routing automatically. Your application connection strings stay the same.

PgBouncer

PgBouncer is a connection pooler that sits between your application and HAProxy. PostgreSQL handles connections as separate OS processes — at scale, thousands of direct connections cause significant overhead (memory, process scheduling, lock contention). PgBouncer maintains a pool of actual PostgreSQL connections and multiplexes many application connections across them.

In transaction pooling mode, PgBouncer allows a much larger number of application connections than PostgreSQL can practically handle directly. A PostgreSQL instance that bogs down at 500 direct connections can serve thousands of application connections through PgBouncer without issue.

Full Stack Diagram

Application
    |
PgBouncer (connection pooling)
    |
HAProxy
    |         |
 Primary   Replica(s)
(Patroni)  (Patroni)
    |         |
         etcd cluster (3 nodes)

How Patroni Failover Works

Understanding the failover sequence helps when debugging issues or setting expectations for RTO (recovery time objective).

Primary failure — the primary PostgreSQL process crashes or the Patroni agent on the primary stops responding.
Leader key expiration — Patroni uses a TTL-based key in etcd as the distributed lock. When the primary's Patroni agent stops refreshing this key, it expires. The default TTL is 30 seconds.
Leader election — replica nodes detect the expired key and race to acquire it. The replica with the most up-to-date WAL position wins (or is preferred). This prevents data loss by ensuring the most advanced replica becomes the new leader.
Promotion — the winning replica promotes its PostgreSQL instance to primary via pg_ctl promote.
Cascade replication update — other replicas receive instructions (via etcd) to stop following the old primary and begin streaming from the new one.
HAProxy detection — HAProxy's health checks hit Patroni's /primary endpoint on each node. The new primary starts returning 200, the old primary (now unreachable) fails. HAProxy routes new connections to the new primary within the next health check interval.

Total failover time is typically 10-30 seconds with default settings. This can be tuned lower by reducing the etcd TTL and HAProxy health check intervals, but aggressive settings increase false positives (unnecessary failovers due to transient network issues).

One important note: because PostgreSQL replication is asynchronous by default, a small amount of data written to the old primary after the last successful replication may be lost during failover. If you need zero data loss, configure synchronous replication (synchronous_standby_names) at the cost of increased write latency.

Hardware Requirements

For a minimal production cluster you need:

Node type	Count	Recommended spec
PostgreSQL + Patroni	3	4+ vCPU, 16GB+ RAM, fast SSD
etcd	3	2 vCPU, 4GB RAM, SSD
HAProxy + PgBouncer	1-2	2 vCPU, 4GB RAM

The three PostgreSQL nodes give you one primary and two replicas. With three nodes, you can tolerate one failure while still maintaining quorum for etcd and a remaining replica to fail over to.

For etcd, disk latency matters significantly. etcd writes every key change to disk before acknowledging it (fsync). High disk latency leads to slow leader elections. SSDs or NVMe are required.

For production workloads with large datasets (hundreds of GB to TB), size the PostgreSQL nodes based on your working set. If your hot data fits in shared_buffers, you avoid disk reads on the hot path. A reasonable starting point is setting shared_buffers to 25% of available RAM.

Cost Comparison: AWS RDS vs Self-Hosted

Managed PostgreSQL on AWS RDS removes operational overhead but at a significant cost premium.

Setup	Configuration	Monthly cost
AWS RDS Multi-AZ	db.r6g.large (2 vCPU, 16GB)	~$200
AWS RDS Multi-AZ	db.r6g.xlarge (4 vCPU, 32GB)	~$400
Hetzner 3-node Patroni	CPX31 x3 (4 vCPU, 8GB each)	~$45
Hetzner 3-node Patroni	CPX41 x3 (8 vCPU, 16GB each)	~$90
Hetzner 3-node Patroni	CX52 x3 (16 vCPU, 32GB each)	~$200

At equivalent specs, self-hosting on Hetzner runs at roughly 20-30% of the AWS RDS cost. The Hetzner CPX41 configuration (3 nodes at 8 vCPU / 16GB each) costs about $90/month and exceeds the capabilities of an RDS db.r6g.large instance at $200/month.

The cost gap widens further at scale. RDS charges separately for storage, I/O, backups, data transfer, and read replicas. Self-hosted storage on Hetzner is included, and you control your backup strategy.

The trade-off is operational responsibility. AWS RDS handles minor version upgrades, automated backups, monitoring, and some failure scenarios automatically. With self-hosted Patroni, you own the operations. Tooling like sshploy reduces this burden significantly, but it doesn't eliminate it entirely.

Step-by-Step Setup Overview

Setting up a Patroni cluster from scratch involves the following steps:

1. Provision servers

Provision three servers that can communicate over a private network. Open ports for PostgreSQL (5432), Patroni REST API (8008), and etcd (2379, 2380).

2. Install etcd

Install etcd on your three coordination nodes. Configure a cluster with each node knowing the addresses of the others. Initialize the cluster and verify quorum with etcdctl endpoint health.

3. Install PostgreSQL

Install PostgreSQL 15, 16, or 17 on each node. Do not initialize a data directory — Patroni will manage this. Install python3-patroni and its dependencies.

4. Configure Patroni

Write a patroni.yml configuration file on each node specifying:

scope: postgres-cluster
namespace: /db/
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.1.1:8008

etcd3:
  hosts: 192.168.1.10:2379,192.168.1.11:2379,192.168.1.12:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
  initdb:
    - encoding: UTF8
    - data-checksums

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.1.1:5432
  data_dir: /var/lib/postgresql/17/main
  authentication:
    replication:
      username: replicator
      password: <password>
    superuser:
      username: postgres
      password: <password>

5. Bootstrap the cluster

Start Patroni on all three nodes. The first node to acquire the etcd leader key initializes the PostgreSQL data directory and becomes the primary. Other nodes clone the primary's data directory via pg_basebackup and start as replicas.

Verify cluster state with patronictl -c /etc/patroni.yml list.

6. Install and configure HAProxy

Configure HAProxy with two backends: one for the primary (checking Patroni's /primary endpoint) and one for replicas (checking /replica). Set health check intervals to 2-5 seconds for fast routing updates.

7. Install and configure PgBouncer

Install PgBouncer on the same or separate node. Point it at HAProxy's primary endpoint. Configure transaction pooling mode and set pool_size based on your PostgreSQL max_connections minus connections reserved for superusers and Patroni.

8. Configure pgBackRest for backups

Install pgBackRest and configure a stanza for your cluster. Store backups in an S3-compatible object store or a dedicated backup server. Configure a daily full backup and hourly incrementals with WAL archiving enabled. Test restores periodically.

Connection Pooling with PgBouncer

PgBouncer operates in three modes:

Session pooling — a PostgreSQL connection is held for the duration of a client session. Useful if you use session-level features like advisory locks or SET commands that persist.
Transaction pooling — a PostgreSQL connection is held only for the duration of a transaction, then returned to the pool. This is the most efficient mode and handles the highest concurrency. Avoid session-level state.
Statement pooling — a connection is held only for a single statement. Very restrictive; rarely used.

For most OLTP applications, transaction pooling is the right choice. With it, 100 actual PostgreSQL connections can serve thousands of concurrent application connections.

Key configuration parameters:

[databases]
mydb = host=127.0.0.1 port=5000 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 100
reserve_pool_size = 10
server_idle_timeout = 600

Set default_pool_size to roughly 80% of PostgreSQL's max_connections divided by the number of PgBouncer instances, leaving headroom for Patroni and administrative connections.

Read/Write Splitting with HAProxy

HAProxy can route read and write traffic to different endpoints. This lets you offload read queries (reports, analytics, read-heavy API calls) to replicas without changing your application's primary connection logic.

frontend postgres_write
  bind *:5000
  default_backend primary

frontend postgres_read
  bind *:5001
  default_backend replicas

backend primary
  option httpchk GET /primary
  server node1 192.168.1.1:5432 check port 8008
  server node2 192.168.1.2:5432 check port 8008
  server node3 192.168.1.3:5432 check port 8008

backend replicas
  option httpchk GET /replica
  balance roundrobin
  server node1 192.168.1.1:5432 check port 8008
  server node2 192.168.1.2:5432 check port 8008
  server node3 192.168.1.3:5432 check port 8008

Your application connects to port 5000 for writes and port 5001 for reads. During failover, HAProxy's health checks detect the new primary automatically. The application does not need to handle failover logic — it reconnects to the same address and hits the new primary.

Backups with pgBackRest

pgBackRest is the standard tool for PostgreSQL backup and point-in-time recovery. It supports:

Full, differential, and incremental backups
WAL archiving and streaming for continuous archiving
Parallel backup and restore
S3, Azure Blob, GCS, and local storage backends
Compression and encryption
Multi-repository backups (backup to two places simultaneously)

A typical backup schedule for production:

Full backup — once per week
Differential backup — daily
WAL archiving — continuous (enables PITR to any point within your retention window)

Configure pgBackRest to read from a replica rather than the primary. This avoids any I/O impact on primary write performance during backup windows.

Test your restore process regularly. A backup you've never tested is not a backup — it's an assumption. Run a restore to a test server quarterly and verify data integrity.

Cross-Region Replicas

For disaster recovery or geo-distributed reads, Patroni supports cascade replication. A replica in a second region streams WAL from a primary-region replica rather than directly from the primary, reducing cross-region bandwidth usage.

Configure the cross-region replica outside Patroni's cluster scope (as a standalone streaming replica) or as a standby cluster using Patroni's standby cluster mode. Standby cluster mode keeps the replica cluster in sync and allows it to be promoted to an independent primary if the main region becomes unavailable.

Key considerations for cross-region replicas:

Network latency between regions increases replication lag. Set recovery_min_apply_delay to buffer WAL and protect against operational errors propagating immediately to the DR replica.
Cross-region data transfer costs add up. Monitor replication bandwidth and consider WAL compression.
Test the promotion procedure for your DR replica before you need it.

How sshploy Helps

Setting up a Patroni cluster from scratch takes days of work: provisioning servers, installing packages, writing configuration files, debugging etcd quorum issues, tuning HAProxy health checks, and wiring pgBackRest into WAL archiving. Getting all of it right requires familiarity with each component.

sshploy automates this entire process. You specify your servers, configure your desired topology (number of nodes, PostgreSQL version, backup destination), and sshploy runs the Ansible playbooks that deploy and configure everything:

PostgreSQL 15, 16, or 17 on each node
Patroni with etcd as the DCS
HAProxy with primary and replica backends
PgBouncer in transaction pooling mode
pgBackRest configured with WAL archiving to your backup store
Firewall rules and security hardening

The deployment runs over SSH directly against your servers — no cloud-specific APIs, no agents to install, no vendor lock-in. You retain full control of your infrastructure.

After deployment, sshploy gives you the connection strings, HAProxy endpoints, and patronictl access to manage your cluster.

FAQ

How long does Patroni failover take?

With default settings, failover completes in 10-30 seconds. The largest factor is the etcd TTL (default 30 seconds) — this is how long Patroni waits before declaring the primary dead. You can reduce the TTL and loop_wait values to achieve faster failover, but very aggressive settings (below 10 seconds) risk false positives on networks with transient latency.

Does Patroni work with PostgreSQL 17?

Yes. Patroni supports PostgreSQL 10 through 17. For new deployments, PostgreSQL 16 or 17 is recommended. PostgreSQL 17 includes improvements to logical replication, VACUUM, and query planning. There are no Patroni-specific limitations for recent PostgreSQL versions.

What is the difference between Patroni and Spilo?

Spilo is a Docker image that packages PostgreSQL and Patroni together, originally developed by Zalando for Kubernetes deployments. Patroni is the underlying cluster manager that Spilo uses. If you're deploying on bare metal or VMs (which is the typical sshploy use case), you run Patroni directly rather than using Spilo. Spilo adds value in Kubernetes environments where the operator pattern fits.

Can I run PgBouncer on the same server as PostgreSQL?

You can, but it adds resource contention and couples the pooler's availability to the PostgreSQL node's availability. For production, running PgBouncer on a separate node (or co-located with HAProxy) is preferable. This also lets you scale PgBouncer independently if connection count becomes a bottleneck.

How do I perform a zero-downtime PostgreSQL minor version upgrade?

With Patroni, minor version upgrades (e.g., PostgreSQL 16.1 to 16.4) can be done with rolling restarts:

Update the PostgreSQL package on one replica.
Restart the Patroni service on that replica. Patroni restarts PostgreSQL and the replica rejoins the cluster.
Repeat for remaining replicas.
Perform a controlled switchover (patronictl switchover) to promote an upgraded replica to primary.
Restart (and upgrade) the old primary, which is now a replica.

Your application experiences no downtime — connections route through HAProxy to available nodes throughout the process. Major version upgrades (e.g., PostgreSQL 15 to 16) require pg_upgrade and more planning, but the cluster architecture keeps downtime minimal.