Handle 100,000 concurrent WebSocket connections on a single Phoenix node
Learning objective
After working through this guide, you'll be able to:
- Tune an Ubuntu 24.04 host to accept 100,000+ simultaneous WebSocket connections.
- Configure the BEAM VM and Phoenix to use those connections without running out of processes, ports, or file descriptors.
- Verify the configuration with a load test that produces evidence, not vibes.
This is a how-to, not a tutorial. It assumes you've already built a Phoenix Channel and are now operating one at scale.
Before you start
You'll need:
- Ubuntu 24.04 LTS (Noble Numbat). Most of the techniques apply to any modern systemd-based distro, but the file paths, defaults, and commands below are written for 24.04 specifically.
- Phoenix 1.7 or later.
-
Root or
sudoaccess on the host—every step below modifies kernel parameters, systemd unit configuration, or BEAM startup flags. -
A Phoenix release deployed via a systemd unit
(
/etc/systemd/system/myapp.service). If you're running the BEAM viamix phx.serverin a shell, the systemd steps don't apply directly; see the interactive-session note in Step 1.
This guide is scoped to a single node holding 100,000 mostly-idle connections that exchange ~1 message per second per connection. If your workload is bursty (each connection emits 50 messages/sec for a few seconds at a time), the scheduler, not the socket layer, will be your bottleneck, and you should read Scaling beyond one node instead.
Step 1: Raise the file descriptor limit
Each WebSocket connection consumes one file descriptor. On Ubuntu 24.04 the default soft limit for an interactive login is 1,024—the BEAM will fail long before 100,000 connections without raising it.
The right place to raise the limit depends on how the BEAM is launched.
This is the trap that catches most people on modern Ubuntu: editing
/etc/security/limits.conf and then wondering why nothing changed.
For a systemd-managed release (the production path): systemd bypasses
PAM, so limits.conf doesn't apply. Set the limit in a drop-in override:
sudo systemctl edit myapp.service
systemd opens an editor on a new drop-in at
/etc/systemd/system/myapp.service.d/override.conf. Add:
[Service]
LimitNOFILE=1048576
LimitNPROC=1048576
Reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart myapp
Verify against the running process, not via shell ulimit—ulimit
reflects your login session, not the service:
$ pgrep -af 'beam.smp.*myapp' | head -1
1234 /opt/myapp/erts-14.2.5/bin/beam.smp -- -root /opt/myapp ...
$ grep 'Max open files' /proc/1234/limits
Max open files 1048576 1048576 files
Both columns should be 1048576.
For an interactive session (development, iex --remsh, ad-hoc
debugging): limits.conf is the right place after all. Edit
/etc/security/limits.conf:
clinton soft nofile 1048576
clinton hard nofile 1048576
Log out, log back in, confirm with ulimit -n.
Step 2: Tune the kernel networking parameters
Create /etc/sysctl.d/99-phoenix-scale.conf:
# Listen backlog (Ubuntu 24.04 default: 4096)
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
# Per-socket buffers (Ubuntu 24.04 default: ~208 KB)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# Ephemeral port range (Ubuntu 24.04 default: 32768-60999).
# Matters when this host opens outbound TCP connections, e.g. a reverse
# proxy in front of Phoenix, or load-test clients running locally.
net.ipv4.ip_local_port_range = 1024 65535
# Faster TIME_WAIT recycling (Ubuntu 24.04 default: 60)
net.ipv4.tcp_fin_timeout = 15
Apply:
sudo sysctl --system
--system loads every file under /etc/sysctl.d/ in lexical order—the
99- prefix ensures your settings win over any package-shipped overrides.
To verify a specific value:
$ sysctl net.core.somaxconn
net.core.somaxconn = 65535
Step 3: Raise BEAM process and port limits
In rel/vm.args.eex (or wherever your mix release defines VM args):
+P 1048576
+Q 1048576
+K true
-
+P 1048576: maximum simultaneous Erlang processes. On modern Erlang/OTP the default is already 1,048,576, so 100k Channels fit with room to spare. Setting it explicitly documents the requirement and guards against a future default change. -
+Q 1048576: maximum simultaneous ports. This is the one that bites you. The default is 65,536. Each TCP socket is a port. At 100k Channels, you're already past the default before counting database connections, HTTP clients, or anything else with a socket. -
+K true: kernel-poll (epollon Linux). Modern Erlang/OTP enables this by default and treats the flag as a no-op; it's kept here only because older tuning guides still call for it.
Rebuild the release (MIX_ENV=prod mix release) and redeploy.
Step 4: Configure the Phoenix endpoint and socket
In config/runtime.exs:
config :myapp, MyAppWeb.Endpoint,
http: [
ip: {0, 0, 0, 0},
port: 4000,
protocol_options: [
max_connections: :infinity,
idle_timeout: 60_000
]
]
:max_connections is :infinity by default in recent Phoenix, but pin it
explicitly: earlier versions set it to 16,384, and a default shift on
upgrade is the kind of thing that produces a 3 a.m. page.
These protocol_options are Cowboy settings. Since Phoenix 1.8, new apps
default to the Bandit adapter, which exposes its own connection and
timeout options; if you're on the default stack, set the equivalents in
Bandit's config instead.
In your UserSocket (or wherever the socket "/socket" declaration lives):
socket "/socket", MyAppWeb.UserSocket,
websocket: [
timeout: 45_000,
max_frame_size: 65_536,
compress: false
]
compress: false is a tradeoff. WebSocket per-message compression trades
CPU for bandwidth. At 100k mostly-idle connections sending small messages,
you have more CPU than network—leave compression off.
Step 5: Plan your memory budget
Each Channel process holds its socket assigns, internal state from
Phoenix.Channel.Server, and an entry in Phoenix.PubSub's subscriber
ETS table. A reasonable rule of thumb is 75 KB per connection, but
measure yours—large assigns move the number substantially. For 100,000
connections:
| Component | Estimate |
|---|---|
| Channel processes (75 KB × 100k) | ~7.5 GB |
| BEAM runtime + binary heap | ~1 GB |
| OS, observability agents, headroom | ~1.5 GB |
| Total | ~10 GB |
Provision 2× your steady-state target. Ubuntu 24.04's OOM killer is no gentler than its predecessors. If the host runs at 8 GB and a small traffic spike pushes you over, the BEAM goes—and takes every Channel with it.
Step 6: Don't extend the heartbeat
The Phoenix Channel protocol sends a heartbeat from each client every 30 seconds. At 100k connections, that's ~3,333 inbound messages per second just for heartbeats—a real load, but well within a tuned node's capacity.
A common temptation is to extend the interval to "reduce load." Don't. The heartbeat is how Phoenix detects stale connections: network partitions, client crashes, mobile devices in tunnels. Lengthening the interval lets dead connections accumulate, each one holding a file descriptor, a port, a process, and ~75 KB of memory. The cure is worse than the disease.
If heartbeat handling is genuinely your bottleneck, the answer is to scale beyond one node, not to weaken your liveness signal.
Step 7: Load test, with evidence
Configuration changes that aren't load-tested aren't real. Install
tsung from Ubuntu's main repo:
sudo apt install tsung
Or use websocket-bench if
you prefer a Node-based tool.
A minimum acceptable test:
- Open 100,000 connections in batches of 5,000, with a 1-second pause between batches. This avoids overwhelming the listen backlog.
- Hold the connections for at least 30 minutes. Spikes don't surface slow leaks; sustained load does.
- Drive each connection at the rate it will see in production (~1 msg/sec for the workload this guide is scoped to).
-
Collect, at one-minute intervals from a remote IEx session:
-
:erlang.system_info(:process_count): should sit at ~100,000 plus your baseline. -
:erlang.system_info(:port_count): same range, plus your baseline TCP sockets. -
:erlang.memory(:total): should plateau, not climb. -
Per-scheduler utilization via
:scheduler.utilization(:erlang.statistics(:scheduler_wall_time)). Keep each scheduler under 80% under steady-state load.
-
If any of those climbs steadily over the 30 minutes, you have a leak,
most likely a Channel that isn't cleaning up on terminate/2, or a
PubSub subscription that isn't being released.
Common pitfalls
-
limits.confedits that did nothing. If you changed the file but the service still hits 1,024 file descriptors, you're running under systemd and need the drop-in from Step 1 instead. -
TIME_WAIT exhaustion on the load-test host. If your load generator
runs locally, it'll exhaust ephemeral ports before the server does.
Verify
net.ipv4.ip_local_port_rangefrom Step 2 is applied on the generator too. -
Heavy work in
handle_in/3. Each Channel process handles one message at a time. Ahandle_in/3that does 50 ms of work caps that channel at 20 msgs/sec. Move slow work to a dedicatedTaskor worker—never block the channel. -
Forgetting cluster overhead. This guide is scoped to one node.
When you cluster,
Phoenix.PubSubbroadcasts traverse the cluster and the channel layer is no longer the bottleneck—the dispatch layer is. Different problem, different guide.
What you've accomplished
You now have:
- An Ubuntu 24.04 host raised past the 1,024-file-descriptor default to one million, with the limit applied to the running BEAM process (not just an interactive shell).
- A BEAM VM configured with a process and port table large enough to hold 100k Channels with room to spare.
- A Phoenix endpoint and socket tuned for sustained, mostly-idle WebSocket traffic.
- A memory budget and load-test recipe that produce evidence the configuration holds under real workload.
If your traffic is going to push past this, either in connection count
or in messages-per-connection, the next stop is
Scaling beyond one node, which covers
clustering Phoenix and the tradeoffs of Phoenix.PubSub topologies.