The decision about how to scale PostgreSQL is, at its core, a decision about where to place the boundary between your application and the database. PostgreSQL exposes a stable, well-documented wire protocol. Every scaling strategy either preserves that protocol unchanged, extends it from inside the engine or hides it behind a managed endpoint. Choosing among these options is easier once the protocol itself is treated as the contract being defended.

This guide compares three positions relative to that contract. A wire-protocol proxy such as PgDog speaks the Postgres protocol on both sides and routes traffic without the application noticing. An in-database extension such as Citus turns a coordinator node into a distributed query planner while leaving the protocol untouched. A managed elastic platform such as Amazon Aurora, Google AlloyDB, Neon or Supabase moves the scaling machinery below a managed endpoint, where it is largely invisible to the client. The recommendations and decision trees at the end map concrete workloads to these options.

The thesis is straightforward. Most teams reach for sharding too early and for managed elasticity too late. The wire protocol gives a clean way to reason about which problem you actually have, because it tells you exactly what the client sees and where the scaling logic lives.

Foundations & Current State

PostgreSQL communication is a stream of typed messages, not a request-response API in the HTTP sense. The official documentation states that “all communication is through a stream of messages” and that “during normal operation, the frontend sends queries and other commands to the backend, and the backend sends back query results and other responses” (PostgreSQL protocol overview). Each message is framed by a single type byte followed by a length: the documentation describes “Byte1(‘Q’) Identifies the message as a simple query” and “Int32 Length of message contents in bytes, including self” (PostgreSQL message formats). This framing is the reason a proxy can sit in the middle and understand the conversation without parsing SQL semantics first.

There are two sub-protocols, and the difference matters for every scaling tool that touches the connection. In the simple query protocol “the frontend just sends a textual query string, which is parsed and immediately executed by the backend.” In the extended query protocol “processing of queries is separated into multiple steps: parsing, binding of parameter values, and execution” (PostgreSQL protocol overview). The extended protocol uses a sequence of Parse, Bind and Execute messages, terminated by a Sync, and it enables pipelining. The documentation notes that pipelining “reduces the number of network round trips needed to complete a given series of operations” (PostgreSQL message flow). Prepared statements live in the extended protocol, and they are the single most common source of incompatibility when a pooler or proxy is introduced.

Connection pooling is the foundation, not an afterthought

Before sharding, before read replicas, the first scaling wall most teams hit is connection exhaustion. PostgreSQL handles each connection with a dedicated server process, because “whenever it detects a request for a connection, it spawns a new backend process” (PostgreSQL connection establishment). Thousands of idle application connections therefore become thousands of expensive server processes. Connection poolers solve this by multiplexing many client connections onto a small set of server connections, and the pooling mode determines what application behavior survives.

PgBouncer, the long-standing reference pooler, documents three modes. In session pooling “a server connection will be assigned to it for the whole duration it stays connected.” In transaction pooling “a server connection is assigned to a client only during a transaction.” Statement pooling is “transaction pooling with a twist: Multi-statement transactions are disallowed” (PgBouncer features). Transaction pooling delivers the highest connection density but breaks session-scoped features such as session-level prepared statements, advisory locks and LISTEN/NOTIFY unless the pooler explicitly compensates.

The three positions relative to the wire protocol

The current state of the ecosystem can be organized by where each tool sits relative to the protocol boundary.

PositionRepresentative toolsWhat the client connects toWhere scaling logic lives
Wire-protocol proxyPgDog, PgCat, Supavisor, PgBouncerA proxy that emulates a Postgres serverIn the proxy, outside the database
In-database extensionCitusA Postgres coordinator nodeInside the database engine
Managed elastic platformAurora, AlloyDB, Neon, SupabaseA managed endpoint or CNAMEBelow the endpoint, in the platform

The proxy approach is exemplified by PgDog, which describes itself as a “drop-in PostgreSQL proxy, no app changes required” that can “share just a few Postgres connections between 100,000+ clients” (PgDog). Its own repository describes it as “a proxy for scaling PostgreSQL” that “supports connection pooling, load balancing queries and sharding entire databases,” and notes that it is “written in Rust” and “can manage thousands of connections on commodity hardware” (PgDog repository). The latest release at the time of writing is v0.1.44, published 11 June 2026 (PgDog releases). The version number is a useful reminder: this is early software.

How a proxy stays transparent

A wire-protocol proxy earns the “drop-in” label only if it tracks protocol state precisely. PgDog explains that “Postgres has two ways to send queries over the network: Simple protocol and Extended protocol,” and that “Postgres messages have a standard format” where “each message starts with a single ASCII letter (1 byte), identifying the message type” (PgDog wire-protocol internals). Because the proxy “can see every byte sent between Postgres and the clients,” it must keep “the protocol in sync” so that it “can manipulate what Postgres receives and what messages are sent back to the client” (PgDog wire-protocol internals). To route sharded rows correctly, PgDog even reuses Postgres hashing code directly through a foreign function interface, reporting that “using the cc (C/C++ compiler) library and by copy/pasting some code, we have a working FFI interface to hashfn.c straight out of the Postgres source code tree” (PgDog wire-protocol internals).

The architectural payoff of a modern proxy is concurrency. PgDog uses the Tokio asynchronous runtime and multiple threads, and its own benchmark reports that “PgDog is faster than PgCat across the board” and “also faster than PgBouncer once we use more than 50 connections,” because “multi-threaded poolers were able to process more requests simultaneously” (PgDog versus PgBouncer benchmark). The same benchmark is candid about the limits: “with just a few connections, libevent is slightly faster,” and threads help only when “the pool size was increased as well” (PgDog versus PgBouncer benchmark). The lesson for adopters is that a proxy is a throughput instrument, not a latency one, and it pays off at high connection counts.

The extension approach is exemplified by Citus, which “is a PostgreSQL extension that allows commodity database servers (called nodes) to coordinate with one another in a ‘shared nothing’ architecture.” Every cluster “has one special node called the coordinator (the others are known as workers),” and applications “send their queries to the coordinator node which relays it to the relevant workers and accumulates the results” (Citus concepts). Because Citus is an extension rather than a fork, it tracks PostgreSQL closely. Citus 14.0 “introduces support for PostgreSQL 18,” and the release page confirms “Citus 14.0 brings PostgreSQL 18.1 support” (Citus 14 announcement; Citus releases).

The managed approach separates storage from compute below the endpoint. The Aurora FAQ states that “Aurora decouples compute from storage” and that the system “automatically divides your database volume into 10 GB segments spread across many disks,” where “each 10 GB chunk of your database volume is replicated six ways, across three AZs” (Amazon Aurora FAQ). AlloyDB takes the same shape, described as a “fully managed, PostgreSQL-compatible database service” that “uses a disaggregated architecture, where compute and storage layers are separate and scale independently” (AlloyDB overview). AlloyDB also folds analytical acceleration into the same engine through a columnar store: its “columnar engine can accelerate the performance of analytical queries by storing data in memory using a columnar format,” reorganizing selected columns into “a column-oriented format” and using “30% of your instance’s memory” by default (AlloyDB columnar engine). This hybrid transactional and analytical capability is a distinguishing feature of the managed tier that neither a proxy nor a stock extension provides.

The current Postgres baseline under all of this is version 18, released 25 September 2025, which introduced “a new asynchronous I/O (AIO) subsystem” with “performance gains of up to 3x in certain scenarios,” along with skip scan on multicolumn B-tree indexes, OAuth authentication, UUIDv7 generation and virtual generated columns (PostgreSQL 18 release announcement). A scaling decision made today should assume PostgreSQL 18 as the floor, because the major scaling tools already support it.

Five trends define the direction of PostgreSQL scaling, and each one strengthens the case for treating the wire protocol as the boundary.

The first trend is the normalization of storage-compute separation. What was once an Aurora-specific innovation is now the default architecture for new entrants. Neon “is a serverless database that splits the system into two independent layers: compute and storage,” where “compute can scale up, scale down, go idle, and be restarted instantly without risking data loss” (Neon architecture overview). Neon explicitly frames this as shared lineage with Aurora, noting that “modern OLTP database systems such as Neon and AWS Aurora separate storage and compute” so that “storage appears ‘bottomless’ and compute scales up and down with the load” (Neon storage and compute performance). The separation has a known cost, because it “incurs an additional network hop from compute to storage which leads to higher latency on buffer pool misses,” and the mitigation is a resizable local file cache (LFC) “sized to be around the size of the physical RAM on a compute node, to provide RAM-like latency” (Neon storage and compute performance). The same architecture enables cheap branching: “when you create a branch in Neon, the engine does not duplicate files or pages,” instead diverging “using copy-on-write semantics” so that “only new or modified data consumes additional storage” (Neon architecture overview). Database branches that behave like Git branches are a genuinely new capability, and they exist only because storage and compute were decoupled first.

The second trend is scale-to-zero as a billing primitive. Aurora Serverless v2 added the ability to scale to zero capacity on 20 November 2024, so that “with 0 ACUs, customers can now save cost during periods of database inactivity,” and “when the first connection is requested, the database will automatically resume and scale to meet the application demand” (Aurora Serverless v2 scaling to zero). Neon offers the same behavior, where “compute endpoints can scale to zero entirely” (Neon architecture overview). For intermittent workloads, idle cost is approaching zero, which changes the economics of running many small databases.

The third trend is the return of horizontal sharding as a first-class, supported capability rather than a custom application concern. Aurora PostgreSQL Limitless Database reached general availability on 31 October 2024 as “a new serverless horizontal scaling (sharding) capability,” letting customers “scale beyond the existing Aurora limits for write throughput and storage by distributing a database workload over multiple Aurora writer instances while maintaining the ability to use it as a single database” (Aurora Limitless general availability). PgDog brings sharding to the proxy layer, where “sharding splits a PostgreSQL database and all its tables and indices between multiple machines” and the proxy can “extract sharding hints directly from the SQL using the PostgreSQL parser” (PgDog sharding). Citus continues to refine in-database sharding, including schema-based sharding where “tables from the same schema are placed on the same node, while different schemas may be on different nodes” (Citus schema-based sharding).

The fourth trend is the most consequential for the next decade: AI agents are becoming the primary consumer of databases. When Databricks announced its intent to acquire Neon on 14 May 2025, it disclosed that “over 80 percent of the databases provisioned on Neon were created automatically by AI agents rather than by humans,” and that “Neon can spin up a fully isolated Postgres instance in 500 milliseconds or less” (Databricks intent to acquire Neon). The resulting product, Lakebase, is positioned as “serverless Postgres for apps and agents” that “scales automatically and branches like code” (Databricks Lakebase). When databases are created by code at machine speed, fast provisioning, scale-to-zero and branching stop being conveniences and become requirements.

The fifth trend is wire-protocol innovation at the network edge. Serverless runtimes that lack raw TCP forced a re-examination of how clients reach Postgres at all. Neon observed that “PostgreSQL connections are made over TCP” but “modern serverless platforms like Cloudflare Workers or Vercel Edge Functions, based on V8 isolates, generally don’t talk TCP” (Neon serverless driver). Its answer keeps the protocol intact by tunneling it: the driver “redirects the PostgreSQL wire protocol via a special proxy” over a WebSocket so that “you get a real, ordinary Postgres connection via a familiar, ordinary Postgres driver” (Neon serverless driver). The protocol is durable enough that vendors prefer to transport it differently rather than replace it.

Market as a Consumer

For an adopter, the relevant market questions are not market size or revenue. They are viability questions: Will this project still exist and be maintained in five years? What does its license require of me? How much am I locking myself in, and what does exit cost? The answers differ sharply across the three positions.

Licensing is the first filter, and it is frequently overlooked. Both leading open-source scaling tools use the GNU Affero General Public License. PgDog “is free and open source software licensed under the AGPL-3.0 license,” though it clarifies that “this license allows anyone to use PgDog internally without sharing source code” and that plugins “can be licensed under any license you wish” (PgDog about). Citus is likewise “an open-source extension of Postgres” under AGPL-3.0 (Citus 14 announcement; Citus repository). For internal deployment the AGPL is generally unproblematic, but organizations that embed these tools into a redistributed product must review the obligations carefully.

Vendor and project longevity is the second filter, and recent events make it concrete.

  • PgDog is early but funded. Its founder announced “$5.5M from Basis Set, YC, Pioneer Fund and other great investors,” stating “we have years of runway” (PgDog funding announcement). A pre-1.0 version number and a single-vendor project argue for caution in mission-critical paths, even with funding in place.
  • Citus is mature but carries a mixed institutional signal. It is the engine behind Microsoft’s managed offering, where “Azure Cosmos DB for PostgreSQL is powered by the Citus open source extension to PostgreSQL.” Yet Microsoft’s own documentation now states that “Azure Cosmos DB for PostgreSQL is on a retirement path and no longer recommended for new projects,” directing PostgreSQL workloads to “the Elastic Clusters feature of Azure Database For PostgreSQL” (Azure Cosmos DB for PostgreSQL introduction). The Citus extension continues, but the branding and packaging around it are shifting, which is exactly the kind of churn an adopter should price in.
  • Neon was acquired by Databricks, announced 14 May 2025, with the explicit goal of serving AI-driven development (Databricks intent to acquire Neon). Acquisition can mean investment or it can mean redirection toward a parent company strategy. Adopters should weigh both outcomes.

Lock-in is the third filter, and it correlates directly with position on the protocol boundary. A proxy preserves the protocol, so the exit path is to remove the proxy and connect directly. An extension keeps standard Postgres underneath, so the data remains portable even if distribution features are abandoned, a point Citus emphasizes by noting that as an extension it avoids maintaining “an entire fork of the complete PostgreSQL codebase” (Citus as a PostgreSQL extension). A managed platform offers the lowest operational burden but the highest lock-in, because the scaling features and operational model are proprietary even when the SQL surface is Postgres-compatible.

Pricing models are the fourth filter, and the managed platforms now compete on consumption-based billing. Neon offers plans where “you pay only for what you use; there’s no minimum monthly fee,” storage “is billed on actual usage in GB-months, measured hourly,” and “each CU allocates approximately 4 GB of RAM, along with associated CPU and local SSD resources” (Neon plans). Aurora Serverless v2 measures capacity in ACUs, “where each ACU is a combination of approximately 2 gibibytes (GiB) of memory, corresponding CPU, and networking” (Aurora Serverless v2 scaling to zero). Consumption billing rewards spiky and intermittent workloads and penalizes steady high-utilization workloads, where a fixed-size instance is usually cheaper.

Recommendations

The recommendations follow from a single principle: solve the problem you have, at the layer closest to it, with the least disruption to the wire-protocol contract.

Start with connection pooling before anything else. If the symptom is connection exhaustion rather than CPU, memory or storage limits, a pooler is the correct and cheapest fix. Transaction-mode pooling delivers the highest density, and the modern multi-threaded poolers handle the prepared-statement problem that historically made transaction mode painful. PgDog “supports prepared statements in transaction mode,” recording each Parse in “a global cache” and giving every client “its own mapping of prepared statement names” (PgDog prepared statements). Supavisor demonstrates that this approach scales, “providing a scalable and cloud-native Postgres connection pooler that can handle millions of connections” (Supavisor).

Scale reads with replicas before sharding writes. A large fraction of workloads are read-heavy, and read replicas are far simpler than any sharding scheme. Supabase describes the model plainly: “Read Replicas are additional databases kept in sync with your Primary database,” replication “is asynchronous to ensure that transactions on the Primary aren’t blocked” and “a load balancer automatically balances requests between your Primary database and Read Replicas” (Supabase read replicas). Aurora makes replica lag negligible because replicas “share the same data volume as the primary instance,” with “lag times in the tens of milliseconds” (Amazon Aurora FAQ). Accept eventual consistency on the read path and most read scaling problems disappear.

Choose managed elastic Postgres when operational simplicity outweighs lock-in. For teams without a dedicated database operations function, the managed platforms remove the hardest work: storage durability, failover and capacity management. Aurora’s failover is automatic, flipping “the canonical name record (CNAME) for your DB Instance to point at the healthy replica,” and “start-to-finish, failover typically completes within 30 seconds” (Amazon Aurora FAQ). For intermittent or agent-generated workloads, scale-to-zero platforms are the natural fit, given the AI-agent provisioning patterns described above.

Plan the client side of failover, because automatic failover is only half the story. Aurora documents that the application must detect a dead connection quickly: “turning on TCP keepalive parameters and setting them aggressively ensures that if your client can’t connect to the database, any active connections are quickly closed,” and these settings “should notify the application within five seconds when the database stops responding” (Aurora PostgreSQL fast failover). It further recommends “setting the java DNS time to live (TTL) to a low value, such as under 30 seconds,” so the driver does not cache a stale endpoint (Aurora PostgreSQL fast failover). A managed platform handles promotion, but a misconfigured client can still turn a 30-second failover into a multi-minute outage.

Choose a wire-protocol proxy when you need sharding without re-platforming. PgDog’s value is that it preserves the protocol while distributing data, routing queries by extracting “sharding hints directly from the SQL,” and sending key-less queries “to all shards with results collected and transformed, as if they came from one database” (PgDog sharding). The constraint to plan around is that “EXECUTE of prepared statements requiring sharding isn’t supported” yet (PgDog prepared statements). Given the early version number, treat a proxy-based shard layer as a deliberate engineering investment, not a turnkey solution.

Choose an in-database extension when you want distributed Postgres with strong transactional guarantees and minimal client change. Citus keeps the wire contract unchanged and provides genuine distributed transactions through two-phase commit: when a client commits, “the Citus coordinator initiates the 2PC protocol,” sending “PREPARE TRANSACTION to both worker nodes” and then “COMMIT PREPARED” once they vote to commit (How Citus executes distributed transactions). For multi-tenant SaaS, schema-based sharding offers “almost no data modelling restrictions or special steps compared to unsharded PostgreSQL” (Citus schema-based sharding), which makes it the lowest-friction sharding option for that workload shape. The row-based alternative is equally well documented: Citus advises that “each tenant’s data can be stored together in a single database instance and kept isolated,” achieved by “making sure every table in our schema has a column to clearly mark which tenant owns which rows” (Citus multi-tenant guide). That distribution column drives co-location, so tenant-scoped queries stay on a single node and avoid cross-shard coordination entirely.

Reserve native sharding for genuine write-throughput or dataset-size ceilings. Sharding adds operational and query-planning complexity at every layer, regardless of where it lives. Aurora Limitless, PgDog and Citus all exist because single-node write limits are real, but they are limits most applications never reach. When you do reach them, prefer the option that keeps the protocol boundary intact, because that choice preserves the most exit paths.

Decision Trees

The following decision tree encodes the recommendations above. It begins with the symptom, not the solution, because the most common scaling mistake is selecting a tool before diagnosing the bottleneck.

flowchart TD
    Start[Postgres is under pressure] --> Q1{What is the bottleneck}
    Q1 -->|Too many connections| Pool[Add a connection pooler in transaction mode]
    Q1 -->|Read load too high| Q2{Can reads tolerate replica lag}
    Q1 -->|Write or storage ceiling| Q3{Do you run database operations yourself}
    Q2 -->|Yes| Replica[Add read replicas behind a load balancer]
    Q2 -->|No| Vertical[Scale the primary vertically first]
    Q3 -->|No, prefer managed| Q4{Is the workload steady or intermittent}
    Q3 -->|Yes, self-managed| Q5{Do clients tolerate any change}
    Q4 -->|Intermittent or agent driven| Serverless[Use serverless scale to zero Postgres]
    Q4 -->|Steady high utilization| Provisioned[Use provisioned managed Postgres or Limitless sharding]
    Q5 -->|No client change allowed| Proxy[Use a wire protocol proxy for sharding]
    Q5 -->|Coordinator connection is acceptable| Extension[Use an in database sharding extension]

A second tree narrows the choice once sharding is confirmed as necessary. The deciding factors are transaction guarantees, client tolerance for change and the operating model.

flowchart TD
    Need[Sharding is required] --> T1{Do you need cross shard ACID transactions}
    T1 -->|Yes, strong guarantees| T2{Is a coordinator node acceptable}
    T1 -->|No, mostly single shard| T3{Must the client stay unchanged}
    T2 -->|Yes| Citus[Choose an in database extension with two phase commit]
    T2 -->|No, fully managed only| ManagedShard[Choose managed sharding from a cloud provider]
    T3 -->|Yes, drop in required| ProxyShard[Choose a wire protocol proxy]
    T3 -->|No, re platform acceptable| Evaluate[Evaluate extension and proxy and managed together]

The trees share one structural feature worth stating directly. Every path that ends in a proxy or an extension preserves the Postgres wire protocol, which means the exit cost stays low. Every path that ends in a proprietary managed feature trades that exit option for operational relief. Neither trade is wrong. The trade should be made deliberately, with the protocol boundary in view, rather than discovered after the fact.

Sources