
While working on vibe coding projects my co-founder early on detected the core limitation of agentic systems. They are doing this sequentially, so AI agent need to wait other agent to start the work. And they are blocking in nature - if you are building agentic system you probably know that almost any SDK or library has agents with blocking run method (await).
So we decided to start addressing this problem and open up new possibilities - if we manage to make agents act more like humans: work autonomously, in parallel, interrupt, react to others and environment.
For me it was hard to structure this agent communication. At first it was hard to detect from where to start - but after days of reading white papers, documentation of LLM providers I found the clue.
It was OpenResponses documentation by OpenAI, Vercel, OpenRouter, Hugging Face - they unite to specify unified structure of context.
They describe all context items in depth - I will cover just the basic idea:
- DeveloperMessage
- SystemMessage
- UserMessage
- Reasoning
- FunctionCall
- FunctionCallOutput
- ModelMessage
All those items are streamable using semantic events.
That was starting point for my framework.
Only after that I realized I need stronger foundation.
I started with building multi-provider library - but in order to go further and to decouple infrastructure for my code library I needed to create higher degree of abstraction than just mapping request to LLM providers.
So I turned this specification into objects - enabling easy population of context, persistence, and streaming.
Now I wanted to create interceptor while those context items generate in real-time.
My second decision was to model agent loop as state machine with following states:
- user message received
- inference pending
- function call pending
- model message received
Switching states based on probability of LLM response. I added hooks for each state so developers can write logic to control loop:
- onUserMessage
- beforeInference
- afterInference
- beforeFunctionCall
- afterFunctionCall
- onModelMessage
It was good progress - but it was still modeling runtime of a single agent.
After that I realized - to structure agent communication they probably need to share context items and semantic events.
That was a mini insight I discovered down the road.
But I soon realized that agent loop is just one concrete implementation of agent workflow with automated switching from one state to another.
I needed more abstract solution - where developers can write logic to control the agent’s stream of context items that are generating in real time.
So I removed agent loop abstraction. And came up with new idea. I will make agents and humans abstract - and model environment where they operate together. Participants subscribe to environment for listening context items. Every time new context item arrives - each participant will be notified so they can react on other participants work.
Also I introduced generators for:
- FunctionCallRunner
- InferenceRunner
- InputStream
Open AI Inference Context Items Generator
Participants need to provide those generators during initialization.
That was important decision - because with abstract generators which are isolated from communication we can easily mock LLM calls and test framework even before we send any real request.
And voilà - we created environment where agents can naturally connect and react on the stream of context items they are producing. Not sequential. Not blocking. But event-driven. More like how humans actually work.
That’s my story of building a TypeScript framework. I am excited to see what possibilities and challenges will emerge when using multiple agents at once.
Star on Github:
https://github.com/jigjoy-ai/mozaik