PostgreSQL High Availability with Patroni: Step-by-Step Guide
Set up a production-ready PostgreSQL HA cluster with Patroni, etcd, PgBouncer, and HAProxy. Covers architecture, automatic failover, connection pooling, and backup strategies.
Running PostgreSQL in production without high availability is a single point of failure. When the primary node goes down — hardware failure, kernel panic, network split — your database is unavailable until someone intervenes manually. For most production applications, that's not acceptable.
Patroni solves this by turning a standalone PostgreSQL instance into a fully automated HA cluster with leader election, automatic failover, and consistent configuration management. This guide walks through the architecture, how the components fit together, and what it takes to run a production Patroni cluster.
Why High Availability Matters for PostgreSQL
PostgreSQL itself does not include cluster management. It supports streaming replication out of the box, but managing which replica becomes the new primary after a failure, reconfiguring replicas to follow the new leader, and preventing split-brain scenarios — none of that is built in.
Without HA, a primary failure requires manual intervention:
- Detect the failure (usually through monitoring, which takes time)
- Promote a replica manually via
pg_ctl promote - Update your application's connection string or load balancer
- Reconfigure remaining replicas to follow the new primary
- Deal with any replication divergence
This process takes anywhere from minutes to hours depending on your team's response time. Patroni reduces this to 10-30 seconds of automated failover with no human intervention required.
Architecture Overview
A production Patroni cluster involves four main components working together:
Patroni
Patroni is a Python-based cluster manager that runs on every PostgreSQL node. It handles:
- Leader election via a distributed configuration store
- Automatic failover when the primary becomes unreachable
- Continuous health checks and cluster state management
- REST API for manual operations (
patronictl) - PostgreSQL configuration management via
postgresql.conf
Patroni supports PostgreSQL 10 through 17. For new deployments, target PostgreSQL 15, 16, or 17. Each node runs both the Patroni process and a PostgreSQL instance.
etcd
etcd is a distributed key-value store used as Patroni's distributed lock and consensus system. Patroni uses etcd to:
- Store the current cluster leader
- Maintain a distributed lock to prevent split-brain
- Broadcast cluster state to all nodes
etcd runs on a separate cluster (3 or 5 nodes for quorum). Some deployments co-locate etcd on the same servers as PostgreSQL, which reduces costs but couples failure domains. A dedicated 3-node etcd cluster is preferable for production.
Alternatives to etcd include Consul and ZooKeeper. etcd is the most common choice with Patroni, is simpler to operate, and integrates cleanly with Patroni's API.
HAProxy
HAProxy sits in front of your PostgreSQL nodes and routes connections to the correct backend:
- Port 5000 — routes to the current primary (read/write)
- Port 5001 — routes to replicas (read-only)
HAProxy performs health checks against Patroni's REST API (/primary and /replica endpoints) to determine which nodes to route to. When failover occurs and a new primary is elected, HAProxy detects the change within seconds through health check polling and updates routing automatically. Your application connection strings stay the same.
PgBouncer
PgBouncer is a connection pooler that sits between your application and HAProxy. PostgreSQL handles connections as separate OS processes — at scale, thousands of direct connections cause significant overhead (memory, process scheduling, lock contention). PgBouncer maintains a pool of actual PostgreSQL connections and multiplexes many application connections across them.
In transaction pooling mode, PgBouncer allows a much larger number of application connections than PostgreSQL can practically handle directly. A PostgreSQL instance that bogs down at 500 direct connections can serve thousands of application connections through PgBouncer without issue.
Full Stack Diagram
Application
|
PgBouncer (connection pooling)
|
HAProxy
| |
Primary Replica(s)
(Patroni) (Patroni)
| |
etcd cluster (3 nodes)
How Patroni Failover Works
Understanding the failover sequence helps when debugging issues or setting expectations for RTO (recovery time objective).
-
Primary failure — the primary PostgreSQL process crashes or the Patroni agent on the primary stops responding.
-
Leader key expiration — Patroni uses a TTL-based key in etcd as the distributed lock. When the primary's Patroni agent stops refreshing this key, it expires. The default TTL is 30 seconds.
-
Leader election — replica nodes detect the expired key and race to acquire it. The replica with the most up-to-date WAL position wins (or is preferred). This prevents data loss by ensuring the most advanced replica becomes the new leader.
-
Promotion — the winning replica promotes its PostgreSQL instance to primary via
pg_ctl promote. -
Cascade replication update — other replicas receive instructions (via etcd) to stop following the old primary and begin streaming from the new one.
-
HAProxy detection — HAProxy's health checks hit Patroni's
/primaryendpoint on each node. The new primary starts returning 200, the old primary (now unreachable) fails. HAProxy routes new connections to the new primary within the next health check interval.
Total failover time is typically 10-30 seconds with default settings. This can be tuned lower by reducing the etcd TTL and HAProxy health check intervals, but aggressive settings increase false positives (unnecessary failovers due to transient network issues).
One important note: because PostgreSQL replication is asynchronous by default, a small amount of data written to the old primary after the last successful replication may be lost during failover. If you need zero data loss, configure synchronous replication (synchronous_standby_names) at the cost of increased write latency.
Hardware Requirements
For a minimal production cluster you need:
| Node type | Count | Recommended spec |
|---|---|---|
| PostgreSQL + Patroni | 3 | 4+ vCPU, 16GB+ RAM, fast SSD |
| etcd | 3 | 2 vCPU, 4GB RAM, SSD |
| HAProxy + PgBouncer | 1-2 | 2 vCPU, 4GB RAM |
The three PostgreSQL nodes give you one primary and two replicas. With three nodes, you can tolerate one failure while still maintaining quorum for etcd and a remaining replica to fail over to.
For etcd, disk latency matters significantly. etcd writes every key change to disk before acknowledging it (fsync). High disk latency leads to slow leader elections. SSDs or NVMe are required.
For production workloads with large datasets (hundreds of GB to TB), size the PostgreSQL nodes based on your working set. If your hot data fits in shared_buffers, you avoid disk reads on the hot path. A reasonable starting point is setting shared_buffers to 25% of available RAM.
Cost Comparison: AWS RDS vs Self-Hosted
Managed PostgreSQL on AWS RDS removes operational overhead but at a significant cost premium.
| Setup | Configuration | Monthly cost |
|---|---|---|
| AWS RDS Multi-AZ | db.r6g.large (2 vCPU, 16GB) | ~$200 |
| AWS RDS Multi-AZ | db.r6g.xlarge (4 vCPU, 32GB) | ~$400 |
| Hetzner 3-node Patroni | CPX31 x3 (4 vCPU, 8GB each) | ~$45 |
| Hetzner 3-node Patroni | CPX41 x3 (8 vCPU, 16GB each) | ~$90 |
| Hetzner 3-node Patroni | CX52 x3 (16 vCPU, 32GB each) | ~$200 |
At equivalent specs, self-hosting on Hetzner runs at roughly 20-30% of the AWS RDS cost. The Hetzner CPX41 configuration (3 nodes at 8 vCPU / 16GB each) costs about $90/month and exceeds the capabilities of an RDS db.r6g.large instance at $200/month.
The cost gap widens further at scale. RDS charges separately for storage, I/O, backups, data transfer, and read replicas. Self-hosted storage on Hetzner is included, and you control your backup strategy.
The trade-off is operational responsibility. AWS RDS handles minor version upgrades, automated backups, monitoring, and some failure scenarios automatically. With self-hosted Patroni, you own the operations. Tooling like sshploy reduces this burden significantly, but it doesn't eliminate it entirely.
Step-by-Step Setup Overview
Setting up a Patroni cluster from scratch involves the following steps:
1. Provision servers
Provision three servers that can communicate over a private network. Open ports for PostgreSQL (5432), Patroni REST API (8008), and etcd (2379, 2380).
2. Install etcd
Install etcd on your three coordination nodes. Configure a cluster with each node knowing the addresses of the others. Initialize the cluster and verify quorum with etcdctl endpoint health.
3. Install PostgreSQL
Install PostgreSQL 15, 16, or 17 on each node. Do not initialize a data directory — Patroni will manage this. Install python3-patroni and its dependencies.
4. Configure Patroni
Write a patroni.yml configuration file on each node specifying:
scope: postgres-cluster
namespace: /db/
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: 192.168.1.1:8008
etcd3:
hosts: 192.168.1.10:2379,192.168.1.11:2379,192.168.1.12:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
initdb:
- encoding: UTF8
- data-checksums
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.1.1:5432
data_dir: /var/lib/postgresql/17/main
authentication:
replication:
username: replicator
password: <password>
superuser:
username: postgres
password: <password>
5. Bootstrap the cluster
Start Patroni on all three nodes. The first node to acquire the etcd leader key initializes the PostgreSQL data directory and becomes the primary. Other nodes clone the primary's data directory via pg_basebackup and start as replicas.
Verify cluster state with patronictl -c /etc/patroni.yml list.
6. Install and configure HAProxy
Configure HAProxy with two backends: one for the primary (checking Patroni's /primary endpoint) and one for replicas (checking /replica). Set health check intervals to 2-5 seconds for fast routing updates.
7. Install and configure PgBouncer
Install PgBouncer on the same or separate node. Point it at HAProxy's primary endpoint. Configure transaction pooling mode and set pool_size based on your PostgreSQL max_connections minus connections reserved for superusers and Patroni.
8. Configure pgBackRest for backups
Install pgBackRest and configure a stanza for your cluster. Store backups in an S3-compatible object store or a dedicated backup server. Configure a daily full backup and hourly incrementals with WAL archiving enabled. Test restores periodically.
Connection Pooling with PgBouncer
PgBouncer operates in three modes:
- Session pooling — a PostgreSQL connection is held for the duration of a client session. Useful if you use session-level features like advisory locks or
SETcommands that persist. - Transaction pooling — a PostgreSQL connection is held only for the duration of a transaction, then returned to the pool. This is the most efficient mode and handles the highest concurrency. Avoid session-level state.
- Statement pooling — a connection is held only for a single statement. Very restrictive; rarely used.
For most OLTP applications, transaction pooling is the right choice. With it, 100 actual PostgreSQL connections can serve thousands of concurrent application connections.
Key configuration parameters:
[databases]
mydb = host=127.0.0.1 port=5000 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 100
reserve_pool_size = 10
server_idle_timeout = 600
Set default_pool_size to roughly 80% of PostgreSQL's max_connections divided by the number of PgBouncer instances, leaving headroom for Patroni and administrative connections.
Read/Write Splitting with HAProxy
HAProxy can route read and write traffic to different endpoints. This lets you offload read queries (reports, analytics, read-heavy API calls) to replicas without changing your application's primary connection logic.
frontend postgres_write
bind *:5000
default_backend primary
frontend postgres_read
bind *:5001
default_backend replicas
backend primary
option httpchk GET /primary
server node1 192.168.1.1:5432 check port 8008
server node2 192.168.1.2:5432 check port 8008
server node3 192.168.1.3:5432 check port 8008
backend replicas
option httpchk GET /replica
balance roundrobin
server node1 192.168.1.1:5432 check port 8008
server node2 192.168.1.2:5432 check port 8008
server node3 192.168.1.3:5432 check port 8008
Your application connects to port 5000 for writes and port 5001 for reads. During failover, HAProxy's health checks detect the new primary automatically. The application does not need to handle failover logic — it reconnects to the same address and hits the new primary.
Backups with pgBackRest
pgBackRest is the standard tool for PostgreSQL backup and point-in-time recovery. It supports:
- Full, differential, and incremental backups
- WAL archiving and streaming for continuous archiving
- Parallel backup and restore
- S3, Azure Blob, GCS, and local storage backends
- Compression and encryption
- Multi-repository backups (backup to two places simultaneously)
A typical backup schedule for production:
- Full backup — once per week
- Differential backup — daily
- WAL archiving — continuous (enables PITR to any point within your retention window)
Configure pgBackRest to read from a replica rather than the primary. This avoids any I/O impact on primary write performance during backup windows.
Test your restore process regularly. A backup you've never tested is not a backup — it's an assumption. Run a restore to a test server quarterly and verify data integrity.
Cross-Region Replicas
For disaster recovery or geo-distributed reads, Patroni supports cascade replication. A replica in a second region streams WAL from a primary-region replica rather than directly from the primary, reducing cross-region bandwidth usage.
Configure the cross-region replica outside Patroni's cluster scope (as a standalone streaming replica) or as a standby cluster using Patroni's standby cluster mode. Standby cluster mode keeps the replica cluster in sync and allows it to be promoted to an independent primary if the main region becomes unavailable.
Key considerations for cross-region replicas:
- Network latency between regions increases replication lag. Set
recovery_min_apply_delayto buffer WAL and protect against operational errors propagating immediately to the DR replica. - Cross-region data transfer costs add up. Monitor replication bandwidth and consider WAL compression.
- Test the promotion procedure for your DR replica before you need it.
How sshploy Helps
Setting up a Patroni cluster from scratch takes days of work: provisioning servers, installing packages, writing configuration files, debugging etcd quorum issues, tuning HAProxy health checks, and wiring pgBackRest into WAL archiving. Getting all of it right requires familiarity with each component.
sshploy automates this entire process. You specify your servers, configure your desired topology (number of nodes, PostgreSQL version, backup destination), and sshploy runs the Ansible playbooks that deploy and configure everything:
- PostgreSQL 15, 16, or 17 on each node
- Patroni with etcd as the DCS
- HAProxy with primary and replica backends
- PgBouncer in transaction pooling mode
- pgBackRest configured with WAL archiving to your backup store
- Firewall rules and security hardening
The deployment runs over SSH directly against your servers — no cloud-specific APIs, no agents to install, no vendor lock-in. You retain full control of your infrastructure.
After deployment, sshploy gives you the connection strings, HAProxy endpoints, and patronictl access to manage your cluster.
FAQ
How long does Patroni failover take?
With default settings, failover completes in 10-30 seconds. The largest factor is the etcd TTL (default 30 seconds) — this is how long Patroni waits before declaring the primary dead. You can reduce the TTL and loop_wait values to achieve faster failover, but very aggressive settings (below 10 seconds) risk false positives on networks with transient latency.
Does Patroni work with PostgreSQL 17?
Yes. Patroni supports PostgreSQL 10 through 17. For new deployments, PostgreSQL 16 or 17 is recommended. PostgreSQL 17 includes improvements to logical replication, VACUUM, and query planning. There are no Patroni-specific limitations for recent PostgreSQL versions.
What is the difference between Patroni and Spilo?
Spilo is a Docker image that packages PostgreSQL and Patroni together, originally developed by Zalando for Kubernetes deployments. Patroni is the underlying cluster manager that Spilo uses. If you're deploying on bare metal or VMs (which is the typical sshploy use case), you run Patroni directly rather than using Spilo. Spilo adds value in Kubernetes environments where the operator pattern fits.
Can I run PgBouncer on the same server as PostgreSQL?
You can, but it adds resource contention and couples the pooler's availability to the PostgreSQL node's availability. For production, running PgBouncer on a separate node (or co-located with HAProxy) is preferable. This also lets you scale PgBouncer independently if connection count becomes a bottleneck.
How do I perform a zero-downtime PostgreSQL minor version upgrade?
With Patroni, minor version upgrades (e.g., PostgreSQL 16.1 to 16.4) can be done with rolling restarts:
- Update the PostgreSQL package on one replica.
- Restart the Patroni service on that replica. Patroni restarts PostgreSQL and the replica rejoins the cluster.
- Repeat for remaining replicas.
- Perform a controlled switchover (
patronictl switchover) to promote an upgraded replica to primary. - Restart (and upgrade) the old primary, which is now a replica.
Your application experiences no downtime — connections route through HAProxy to available nodes throughout the process. Major version upgrades (e.g., PostgreSQL 15 to 16) require pg_upgrade and more planning, but the cluster architecture keeps downtime minimal.
Ready to deploy?
Skip the manual setup. sshploy handles the entire deployment for you.
Deploy PostgreSQL with PatroniRelated guides
Self-Hosting vs Managed Databases: A Practical Cost & Control Comparison
Compare self-hosted and managed database services across cost, control, performance, and operational overhead. Includes real pricing breakdowns for PostgreSQL, ClickHouse, and Redis.
How to Deploy a ClickHouse Cluster: Complete Setup Guide
Learn how to deploy a production-ready ClickHouse cluster with sharding, replication, and ClickHouse Keeper. Covers architecture decisions, hardware sizing, and step-by-step configuration.