Scaling Phoenix Channels beyond one node
Learning objective
After working through this guide, you'll be able to:
- Cluster a Phoenix application across multiple nodes using
libcluster.- Choose between the default
:pg-based PubSub adapter and the Redis adapter based on your topic locality.- Identify the operational signals that mean you've outgrown your single-node ceiling and need to go horizontal.
This is the follow-up to Handle 100,000 concurrent connections on a single node. You should already be running a well-tuned single-node Phoenix deployment before reaching for clustering.
Before you start
You'll need:
- A working Phoenix release deployed to at least one node, tuned per the single-node guide.
- A way for nodes to discover each other on the network: Kubernetes (DNS-based discovery), AWS (instance tag polling), or a static IP list for small fleets.
-
Phoenix 1.7+ and
libcluster({:libcluster, "~> 3.3"}inmix.exs).
When you actually need to cluster
The single-node ceiling is real but generous: a well-tuned Phoenix node on commodity hardware (16 cores, 32 GB RAM) sustains 100,000–300,000 concurrent Channels at moderate message rates. Before you cluster, confirm you've hit a real ceiling and not a tuning gap:
-
Memory:
:erlang.memory(:total)consistently above 80% of host RAM under steady-state load. -
CPU: Per-scheduler utilization
(
:scheduler.utilization/1) above 80% sustained. -
Network: NIC saturation at peak (use
bmonornload). -
Process count:
:erlang.system_info(:process_count)approaching+P.
If none of these are pressing, your bottleneck is somewhere else,
usually downstream of Phoenix, in the database, an external API, or a
slow handle_in/3 callback. Cluster only when single-node tuning has
actually run out of room.
Step 1: Add libcluster
In mix.exs:
defp deps do
[
{:libcluster, "~> 3.3"}
]
end
In config/runtime.exs (or wherever you configure the cluster topology):
config :libcluster,
topologies: [
phoenix: [
strategy: Cluster.Strategy.Kubernetes.DNS,
config: [
service: "myapp-headless",
application_name: "myapp"
]
]
]
The strategy depends on your deployment target.
Cluster.Strategy.Kubernetes.DNS works with a headless Service;
Cluster.Strategy.Epmd works for static node lists;
Cluster.Strategy.Gossip works for self-discovering fleets. See the
libcluster docs for the full list.
Add the supervisor to your application:
def start(_type, _args) do
topologies = Application.get_env(:libcluster, :topologies)
children = [
{Cluster.Supervisor, [topologies, [name: MyApp.ClusterSupervisor]]}
# ... other children
]
Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)
end
Confirm clustering works with Node.list() in an IEx remote shell—it
should show every other node in the cluster.
Step 2: Choose your PubSub adapter
Phoenix.PubSub defaults to the :pg-based adapter, which broadcasts
every message to every other node in the cluster. This works well when
most topics have subscribers on every node—a common case for chat,
presence, and notification workloads.
It works less well when subscribers are localized. If you have a 20-node cluster but each topic only has subscribers on 2 or 3 nodes, you're paying 17 unnecessary cluster crossings per broadcast. In those workloads, the Redis adapter, which subscribes to topics rather than nodes, is more efficient at the cost of an external dependency.
Default (no change needed). Phoenix starts PubSub in your supervision
tree, and the generated child spec already uses the default
Phoenix.PubSub.PG2 adapter:
# lib/my_app/application.ex
{Phoenix.PubSub, name: MyApp.PubSub}
Redis-backed. Add the dependency, then pass the adapter and its options to the same child spec:
# mix.exs
{:phoenix_pubsub_redis, "~> 3.0"}
# lib/my_app/application.ex (replace the default PubSub child)
{Phoenix.PubSub,
name: MyApp.PubSub,
adapter: Phoenix.PubSub.Redis,
redis_opts: System.get_env("REDIS_URL"),
node_name: System.get_env("NODE_NAME")}
The choice is rarely revisited once made. Most teams stay on :pg unless
they have specific topic-locality reasons to switch.
Step 3: Tune Erlang distribution
Cross-node send/2 traverses Erlang distribution, which has its own
buffer settings. For high-throughput PubSub, raise the distribution
buffer:
+zdbbl 32768
That's 32 MB per distribution buffer (default is 1 MB). Without it, large broadcast bursts to a remote node can drop connections.
Step 4: Verify with a multi-node load test
Repeat the load test from the single-node guide, but spread the connections across nodes. A clustered system should hold the same per-node load as a single-node deployment, with linear scaling:
- 3 nodes → ~300k connections.
- 10 nodes → ~1M connections.
- Cross-node broadcast latency should add 1–5 ms over single-node latency.
If you see worse-than-linear scaling, the bottleneck is usually distribution buffer size (Step 3) or topic locality (Step 2).
Common pitfalls
- Sticky load balancers. Channels require WebSocket affinity—a client must stay on the same node for the life of its connection. Use sticky sessions or a load balancer that hashes by client IP.
-
Node.list/0returning empty. libcluster's discovery is asynchronous. In tests,Process.sleep(500)afterCluster.Supervisor.start_link/1to give it time. In production, monitor for the empty case and alert. -
Rolling deploys dropping connections. A clustered Phoenix doesn't,
by default, migrate connections during a deploy. Either drain
connections gracefully (set the node's
Phoenix.Endpointto refuse new connections, wait for existing ones to disconnect, then terminate) or accept the disconnect/reconnect cycle on each deploy.
What you've accomplished
You now have a clustered Phoenix deployment, with PubSub distribution chosen for your topic-locality pattern and Erlang distribution tuned for the broadcast burst sizes you've measured. The capacity ceiling moves from one node to your cluster's aggregate, with the operational complexity of cross-node communication tracked explicitly.
For deeper context on what cross-node broadcasts are actually doing, see How Phoenix.PubSub distributes messages.