Elixir Was Built for Agentic AI (40 Years Early)

TL;DR

Erlang’s telecom heritage is an exact match for agentic AI. Isolated processes with their own state, communicating over unreliable networks. Ericsson solved this in 1986.
GenServers are agents. Each one maintains its own state, processes messages independently, and can be supervised and restarted on failure. That’s the agent pattern.
The BEAM handles millions of concurrent processes. While other languages reach for containers and orchestration, Elixir handles agent concurrency natively at 25KB per process.
Fault tolerance isn’t optional with AI. LLM APIs fail, timeout, and hallucinate. Elixir’s “let it crash” philosophy and supervision trees handle this gracefully by default.
The ecosystem is already here. Libraries like LangChain, Instructor, and Jido bring agent orchestration, structured outputs, and multi-agent systems to Elixir today.

In 1986, Ericsson had a problem. They needed software to manage telephone switches: thousands of simultaneous calls, each with its own state, all running on unreliable hardware. A dropped call couldn’t bring down the system. A failed circuit couldn’t cascade into an outage.

Their solution was Erlang and the BEAM virtual machine. Each phone call became an isolated process. Each process managed its own state. Supervisors monitored everything and restarted failures automatically. The system was designed from the ground up for isolated intelligences communicating over unreliable networks.

Forty years later, that description maps perfectly to agentic AI.

The Parallel Nobody Is Talking About

Here’s what agentic AI actually looks like at the infrastructure level. Your application spawns an agent. That agent has a task, a context, and its own state. It makes calls to external services (LLMs, tools, APIs) over networks that can fail at any time. Multiple agents might run simultaneously, each doing different work. Some agents spawn sub-agents. If one fails, the rest should keep running.

Read that paragraph again, but replace “agent” with “phone call” and “LLM” with “telephone switch.” It’s the same problem. Ericsson’s engineers solved it before most of us were born.

Elixir, which compiles to Erlang’s BEAM VM, inherits all of this. It’s not a bolted-on feature or a library you install. It’s how the language works.

GenServers Are Already Agents

A GenServer in Elixir is a process that:

Maintains its own state across interactions
Receives messages and responds to them
Runs concurrently with every other process
Can be monitored and restarted automatically

That’s an AI agent. Not metaphorically. Literally.

When you build an agentic system in Elixir, the mapping is direct. Each agent is a GenServer. Its state holds the conversation history, tool definitions, and current task. It receives messages (“here’s a user query”), processes them (calls the LLM, executes tools), and maintains context between interactions.

Need to run five agents in parallel? Spawn five GenServers. Need one agent to delegate to three sub-agents? Have it start a DynamicSupervisor and spawn children. Need to kill a stuck agent without affecting the rest? Send it a shutdown message. Its supervisor will handle cleanup.

In Python, you’d reach for threading, asyncio, or Celery. You’d build state management. You’d add retry logic. You’d worry about race conditions. In Elixir, the platform handles all of this.

Fault Tolerance for Unreliable Intelligence

LLM APIs are unreliable by nature. They time out. They return errors. They hallucinate. They change behavior between model versions. If your agent orchestration can’t handle failures gracefully, you’re building on sand.

Elixir’s “let it crash” philosophy was designed for exactly this. Instead of wrapping every API call in defensive error handling, you let processes fail and rely on supervisors to recover.

A supervisor tree for an agentic system might look like this: a top-level supervisor manages a pool of agent supervisors. Each agent supervisor manages one agent process and its tool-calling children. If an LLM call times out, the tool-calling process crashes. The agent supervisor restarts it. The agent retries with backoff. If the agent itself gets into a bad state, its supervisor restarts the whole agent. The rest of the system never notices.

This isn’t theoretical resilience. This is how Ericsson achieved nine nines of availability (31 milliseconds of downtime per year) on telephone switches. The same patterns work for AI.

Concurrency That Actually Scales

Here’s a practical concern: how many agents can you run simultaneously?

The BEAM VM can handle millions of concurrent processes. Each Elixir process uses roughly 25KB of memory at startup, compared to megabytes for OS threads or container instances. Discord runs 5 million concurrent users on Elixir. Phoenix has demonstrated 2 million simultaneous WebSocket connections on a single machine.

For agentic workloads, this means you can run thousands of agents concurrently without reaching for Kubernetes, message queues, or distributed task runners. Your orchestration layer (the part that decides which agents to spawn, how to route messages, and when to shut things down) can be a single Elixir application.

That operational simplicity compounds. Fewer moving parts means fewer failure modes, which matters when you’re building systems that depend on unreliable external services.

The Ecosystem Is Ready

This isn’t a thought experiment. People are building agentic systems in Elixir today.

LangChain (the Elixir version, not the Python one) provides chains, agents, and tool integration with support for OpenAI, Anthropic, and Google models. It handles streaming, function calling, and conversation management.

Instructor brings structured outputs to LLM calls. Define an Ecto schema, pass it to the LLM, and get validated, typed data back. No parsing JSON strings. No hoping the model followed your format instructions. The library handles validation and retry.

Jido is a framework specifically for multi-agent systems in Elixir. Each agent is a supervised process using about 25KB of memory, with built-in action pipelines, tool registration, and inter-agent communication through PubSub.

Bumblebee runs machine learning models directly on the BEAM. Embeddings, text classification, image recognition, all without leaving your Elixir application or making network calls.

FLAME lets you burst compute-heavy work (like ML inference) to ephemeral cloud instances and return results to your running application. Think of it as auto-scaling for the expensive parts of your pipeline, without rearchitecting anything.

Streaming and Real-Time Feedback

When an agent is working (calling an LLM, executing tools, processing results) your users shouldn’t be staring at a spinner.

LiveView makes streaming agent output trivial. Because LiveView maintains a persistent WebSocket connection and the agent is a GenServer in the same application, you can push partial results to the UI as they arrive. Token by token if you want.

No WebSocket libraries. No polling. No separate real-time infrastructure. The agent process sends a message to the LiveView process. The UI updates. That’s it.

This matters because agentic interactions are inherently long-running. A multi-step agent might take 30 seconds to complete a task. Without streaming feedback, that’s an unacceptable user experience. With LiveView and GenServers, you get real-time visibility into what the agent is doing at every step.

What About Python?

Python has more AI libraries. That’s true, and it matters. If you need a specific model that only has Python bindings, Python is the right choice for that piece.

But there’s a difference between ML model execution and agent orchestration. Python is excellent at the former. It’s mediocre at the latter.

Python’s concurrency story is constrained by the GIL. Its process model requires external tools (Celery, Redis, Kubernetes) for real concurrency. Its error handling is exception-based, which makes building resilient multi-agent systems an exercise in defensive coding.

Elixir’s sweet spot is the orchestration layer. The part that manages agent lifecycles, routes messages, handles failures, and maintains state. You can still call Python models from Elixir (via Bumblebee’s Nx, HTTP APIs, or Ports), but the coordination happens in a runtime that was purpose-built for it.

The 40-Year Head Start

Every new “agent framework” I see is reinventing concepts that Elixir developers use daily. State management? GenServer. Parallel execution? Task.async. Supervision and recovery? Supervisor trees. Message passing between agents? Built into the language.

The AI industry is discovering, one framework at a time, that building reliable systems out of independent, stateful, failure-prone processes is hard. Erlang’s creators figured that out in a telephone switching office in 1986.

Elixir gives you their solution with modern syntax and a thriving ecosystem. If you’re building systems where AI agents need to run concurrently, maintain state, communicate with each other, and recover from failures (which is any serious agentic system) it’s worth looking at the language that was designed for exactly this problem.

David Kerr is the founder of Kerrberry Systems. He builds custom software in Elixir and Phoenix, increasingly for systems that orchestrate AI. Find him on LinkedIn or GitHub.