Handle 100,000 concurrent WebSocket connections on a single Phoenix node

Learning objective

After working through this guide, you'll be able to:

Tune an Ubuntu 24.04 host to accept 100,000+ simultaneous WebSocket connections.

Configure the BEAM VM and Phoenix to use those connections without running out of processes, ports, or file descriptors.

Verify the configuration with a load test that produces evidence, not vibes.

This is a how-to, not a tutorial. It assumes you've already built a Phoenix Channel and are now operating one at scale.

Before you start

You'll need:

Ubuntu 24.04 LTS (Noble Numbat). Most of the techniques apply to any modern systemd-based distro, but the file paths, defaults, and commands below are written for 24.04 specifically.
Phoenix 1.7 or later.
Root or sudo access on the host—every step below modifies kernel parameters, systemd unit configuration, or BEAM startup flags.
A Phoenix release deployed via a systemd unit (/etc/systemd/system/myapp.service). If you're running the BEAM via mix phx.server in a shell, the systemd steps don't apply directly; see the interactive-session note in Step 1.

This guide is scoped to a single node holding 100,000 mostly-idle connections that exchange ~1 message per second per connection. If your workload is bursty (each connection emits 50 messages/sec for a few seconds at a time), the scheduler, not the socket layer, will be your bottleneck, and you should read Scaling beyond one node instead.

Step 1: Raise the file descriptor limit

Each WebSocket connection consumes one file descriptor. On Ubuntu 24.04 the default soft limit for an interactive login is 1,024—the BEAM will fail long before 100,000 connections without raising it.

The right place to raise the limit depends on how the BEAM is launched. This is the trap that catches most people on modern Ubuntu: editing /etc/security/limits.conf and then wondering why nothing changed.

For a systemd-managed release (the production path): systemd bypasses PAM, so limits.conf doesn't apply. Set the limit in a drop-in override:

sudo systemctl edit myapp.service

systemd opens an editor on a new drop-in at /etc/systemd/system/myapp.service.d/override.conf. Add:

[Service]
LimitNOFILE=1048576
LimitNPROC=1048576

Reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart myapp

Verify against the running process, not via shell ulimit—ulimit reflects your login session, not the service:

$ pgrep -af 'beam.smp.*myapp' | head -1
1234 /opt/myapp/erts-14.2.5/bin/beam.smp -- -root /opt/myapp ...
$ grep 'Max open files' /proc/1234/limits
Max open files            1048576              1048576              files

Both columns should be 1048576.

For an interactive session (development, iex --remsh, ad-hoc debugging): limits.conf is the right place after all. Edit /etc/security/limits.conf:

clinton soft nofile 1048576
clinton hard nofile 1048576

Log out, log back in, confirm with ulimit -n.

Step 2: Tune the kernel networking parameters

Create /etc/sysctl.d/99-phoenix-scale.conf:

# Listen backlog (Ubuntu 24.04 default: 4096)
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# Per-socket buffers (Ubuntu 24.04 default: ~208 KB)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# Ephemeral port range (Ubuntu 24.04 default: 32768-60999).
# Matters when this host opens outbound TCP connections, e.g. a reverse
# proxy in front of Phoenix, or load-test clients running locally.
net.ipv4.ip_local_port_range = 1024 65535

# Faster TIME_WAIT recycling (Ubuntu 24.04 default: 60)
net.ipv4.tcp_fin_timeout = 15

Apply:

sudo sysctl --system

--system loads every file under /etc/sysctl.d/ in lexical order—the 99- prefix ensures your settings win over any package-shipped overrides.

To verify a specific value:

$ sysctl net.core.somaxconn
net.core.somaxconn = 65535

Step 3: Raise BEAM process and port limits

In rel/vm.args.eex (or wherever your mix release defines VM args):

+P 1048576
+Q 1048576
+K true

+P 1048576: maximum simultaneous Erlang processes. On modern Erlang/OTP the default is already 1,048,576, so 100k Channels fit with room to spare. Setting it explicitly documents the requirement and guards against a future default change.
+Q 1048576: maximum simultaneous ports. This is the one that bites you. The default is 65,536. Each TCP socket is a port. At 100k Channels, you're already past the default before counting database connections, HTTP clients, or anything else with a socket.
+K true: kernel-poll (epoll on Linux). Modern Erlang/OTP enables this by default and treats the flag as a no-op; it's kept here only because older tuning guides still call for it.

Rebuild the release (MIX_ENV=prod mix release) and redeploy.

Step 4: Configure the Phoenix endpoint and socket

In config/runtime.exs:

config :myapp, MyAppWeb.Endpoint,
  http: [
    ip: {0, 0, 0, 0},
    port: 4000,
    protocol_options: [
      max_connections: :infinity,
      idle_timeout: 60_000
    ]
  ]

:max_connections is :infinity by default in recent Phoenix, but pin it explicitly: earlier versions set it to 16,384, and a default shift on upgrade is the kind of thing that produces a 3 a.m. page.

These protocol_options are Cowboy settings. Since Phoenix 1.8, new apps default to the Bandit adapter, which exposes its own connection and timeout options; if you're on the default stack, set the equivalents in Bandit's config instead.

In your UserSocket (or wherever the socket "/socket" declaration lives):

socket "/socket", MyAppWeb.UserSocket,
  websocket: [
    timeout: 45_000,
    max_frame_size: 65_536,
    compress: false
  ]

compress: false is a tradeoff. WebSocket per-message compression trades CPU for bandwidth. At 100k mostly-idle connections sending small messages, you have more CPU than network—leave compression off.

Step 5: Plan your memory budget

Each Channel process holds its socket assigns, internal state from Phoenix.Channel.Server, and an entry in Phoenix.PubSub's subscriber ETS table. A reasonable rule of thumb is 75 KB per connection, but measure yours—large assigns move the number substantially. For 100,000 connections:

Component	Estimate
Channel processes (75 KB × 100k)	~7.5 GB
BEAM runtime + binary heap	~1 GB
OS, observability agents, headroom	~1.5 GB
Total	~10 GB

Provision 2× your steady-state target. Ubuntu 24.04's OOM killer is no gentler than its predecessors. If the host runs at 8 GB and a small traffic spike pushes you over, the BEAM goes—and takes every Channel with it.

Step 6: Don't extend the heartbeat

The Phoenix Channel protocol sends a heartbeat from each client every 30 seconds. At 100k connections, that's ~3,333 inbound messages per second just for heartbeats—a real load, but well within a tuned node's capacity.

A common temptation is to extend the interval to "reduce load." Don't. The heartbeat is how Phoenix detects stale connections: network partitions, client crashes, mobile devices in tunnels. Lengthening the interval lets dead connections accumulate, each one holding a file descriptor, a port, a process, and ~75 KB of memory. The cure is worse than the disease.

If heartbeat handling is genuinely your bottleneck, the answer is to scale beyond one node, not to weaken your liveness signal.

Step 7: Load test, with evidence

Configuration changes that aren't load-tested aren't real. Install tsung from Ubuntu's main repo:

sudo apt install tsung

Or use websocket-bench if you prefer a Node-based tool.

A minimum acceptable test:

Open 100,000 connections in batches of 5,000, with a 1-second pause between batches. This avoids overwhelming the listen backlog.
Hold the connections for at least 30 minutes. Spikes don't surface slow leaks; sustained load does.
Drive each connection at the rate it will see in production (~1 msg/sec for the workload this guide is scoped to).
Collect, at one-minute intervals from a remote IEx session:
- :erlang.system_info(:process_count): should sit at ~100,000 plus your baseline.
- :erlang.system_info(:port_count): same range, plus your baseline TCP sockets.
- :erlang.memory(:total): should plateau, not climb.
- Per-scheduler utilization via :scheduler.utilization(:erlang.statistics(:scheduler_wall_time)). Keep each scheduler under 80% under steady-state load.

If any of those climbs steadily over the 30 minutes, you have a leak, most likely a Channel that isn't cleaning up on terminate/2, or a PubSub subscription that isn't being released.

Common pitfalls

limits.conf edits that did nothing. If you changed the file but the service still hits 1,024 file descriptors, you're running under systemd and need the drop-in from Step 1 instead.
TIME_WAIT exhaustion on the load-test host. If your load generator runs locally, it'll exhaust ephemeral ports before the server does. Verify net.ipv4.ip_local_port_range from Step 2 is applied on the generator too.
Heavy work in handle_in/3. Each Channel process handles one message at a time. A handle_in/3 that does 50 ms of work caps that channel at 20 msgs/sec. Move slow work to a dedicated Task or worker—never block the channel.
Forgetting cluster overhead. This guide is scoped to one node. When you cluster, Phoenix.PubSub broadcasts traverse the cluster and the channel layer is no longer the bottleneck—the dispatch layer is. Different problem, different guide.

What you've accomplished

You now have:

An Ubuntu 24.04 host raised past the 1,024-file-descriptor default to one million, with the limit applied to the running BEAM process (not just an interactive shell).
A BEAM VM configured with a process and port table large enough to hold 100k Channels with room to spare.
A Phoenix endpoint and socket tuned for sustained, mostly-idle WebSocket traffic.
A memory budget and load-test recipe that produce evidence the configuration holds under real workload.

If your traffic is going to push past this, either in connection count or in messages-per-connection, the next stop is Scaling beyond one node, which covers clustering Phoenix and the tradeoffs of Phoenix.PubSub topologies.