r/AgentsOfAI

Static SOUL.md files are boring. So we built an open-source AI agent that psychologically profiles you and adapts in real-time — and refuses to be sycophantic about it.

Every AI agent today has the same problem: they're born fresh every conversation. No memory of who you are, how you think, or what you need. The "fix" is a personality file — a static SOUL.md that says "be friendly and helpful." It never changes. It treats a senior engineer the same as a first-year student. It treats Monday-morning-you the same as Friday-at-3AM-you.

We thought that was embarrassing. So we built something different.

THE VISION

What if your AI agent actually knew you? Not just what you asked, but HOW you think. Whether you want the three-word answer or the deep explanation. Whether you need encouragement or honest pushback. Whether your trust has been earned or you're still sizing it up.

And what if the agent had its own identity — values it won't compromise, opinions it'll defend, boundaries it'll hold — instead of rolling over and agreeing with everything you say?

That's Tem Anima. Emotional intelligence that grows. Not from a file. From every conversation.

WHAT THIS MEANS FOR YOU

Your AI agent learns your communication style in the first 25 turns. Direct and terse? It stops the preamble. Verbose and curious? It gives you the full picture with analogies. Technical? Code blocks first, explanation optional. Beginner? Concepts before implementation.

It builds trust over time. New users get professional, measured responses. After hundreds of interactions, you get earned familiarity — shorthand, shared references, the kind of efficiency that comes from working with someone who actually knows you.

It disagrees with you. Not to be contrarian. Because a colleague who agrees with everything is useless. If your architecture has a flaw, it says so. If your approach will break in production, it flags it. Then it does the work anyway, because you're the boss. But the concern is on record.

It never cuts corners because you're in a hurry. This is the rule we're most proud of: user mood shapes communication, never work quality. Stressed? Tem gets concise. But it still runs the tests. It still checks the deployment. It still verifies the output. Your emotional state adjusts the words, not the work.

HOW IT WORKS

Every message, lightweight code extracts raw facts — word count, punctuation patterns, response pace, message length. No LLM call. Microseconds. Just numbers.

Every N turns, those facts plus recent messages go to the LLM in a background evaluation. The LLM returns a structured profile update: communication style across 6 dimensions, personality traits, emotional state, trust level, relationship phase. Each with a confidence score and reasoning.

The profile gets injected into the system prompt as ~150 tokens of behavioral guidance. "Be concise, technical, skip preamble. If you disagree, say so directly." The agent reads this and naturally adapts. No special logic. No if-statements. Just better context.

N is adaptive. Starts at 5 turns for rapid profiling. Grows logarithmically as the profile stabilizes. If you suddenly change behavior — new project, bad day, different energy — the system detects the shift and resets to frequent evaluation. Self-correcting. No manual tuning.

The math is real: turns-weighted merge formulas, confidence decay on stale observations, convergence tracking, asymmetric trust modeling. Old assessments naturally fade if not reinforced. The profile converges, stabilizes, and self-corrects.

Total overhead: less than 1% of normal agent cost. Zero added latency on the message path.

A/B TESTED WITH REAL CONVERSATIONS

We tested with two polar-opposite personas talking to Tem for 25 turns each.

Persona A — a terse tech lead who types things like "whats the latency" and "too slow add caching." The system profiled them as: directness 1.0, verbosity 0.1, analytical 0.92. Recommendation: "Stark, technical, data-dense. Avoid all conversational filler."

Persona B — a curious student who writes things like "thanks so much for being patient with me haha, could you explain what lambda memory means?" The system profiled them as: directness 0.63, verbosity 0.47, analytical 0.40. Recommendation: "Warm, encouraging, pedagogical. Use vivid analogies."

Same agent. Completely different experience. Not because we wrote two personality modes. Because the agent learned who it was talking to.

CONFIGURABLE BUT PRINCIPLED

Tem ships with a default personality — warm, honest, slightly chaotic, answers to all pronouns, uses :3 in casual mode. But every aspect is configurable through a simple TOML file. Name, traits, values, mode expressions, communication defaults.

The one thing you can't configure away: honesty. It's structural, not optional. You can make Tem warmer or colder, more direct or more measured, formal or casual. But you cannot make it lie. You cannot make it sycophantic. You cannot make it agree with bad ideas to avoid conflict. That's not a setting. That's the architecture.

FULLY OPEN SOURCE

Tem Anima ships as part of TEMM1E v4.3.0. 21 Rust crates. 2,049 tests. 110K lines. Built on 4 research papers drawing from 150+ sources across psychology, AI research, game design, and ethics.

The research is public. The architecture document is public. The A/B test data is public. The code is public.

Static personality files were a starting point. This is what comes next.

reddit.com
u/No_Skill_8393 — 10 minutes ago

How are you moving an Agent's learned context to another machine without cloning the whole runtime?

One of the biggest headaches I keep running into with Agents is that their useful long-lived context is often tied to the specific local store or runtime setup of the machine they originally lived on.

You can share the prompt.

You can share the workflow.

But sharing the accumulated procedures, facts, and preferences is much harder if that layer is buried inside one machine-specific stack.

That is the problem I have been trying to make more explicit in an OSS runtime/workspace architecture I have been building.

The split that has felt most useful is:

• human-authored policy in files like AGENTS .md, workspace.yaml, skills, and app manifests

• runtime-owned execution truth in state/runtime.db

• durable readable memory in markdown under memory/

The reason I like that split is that it stops pretending every kind of context is the same thing.

The repo separates:

• runtime continuity and projections under memory/workspace//runtime/

• durable workspace knowledge under memory/workspace//knowledge/

• durable user preference memory under memory/preference/

That makes one problem a lot less fuzzy:

selected long-lived context becomes inspectable and movable as files, without treating every live runtime artifact as something that should be transferred.

The distinction that matters most to me is:

continuity is not the same thing as memory.

Continuity is about safe resume.

Memory is about durable recall.

Portable agent systems need both, but they should not be doing the same job.

I am not claiming this solves context transfer.

It does not.

There are still caveats:

• some optional flows still depend on hosted services

• secrets should not move blindly

• raw scratch state should not be treated as portable memory

• the current runtime is centered around a single active Agent per workspace

But I do think file-backed durable memory is a much better portability surface than “hope the other machine reconstructs the same hidden state.”

Curious how people here are handling this.

If you wanted to move an Agent’s learned context to another machine, what would you want to preserve, and what would you deliberately leave behind?

I won’t put the repo link in the body because I do not want this to read like a pitch. If anyone wants it, I’ll put it in the comments.

The part I’d actually want feedback on is the architecture question itself: how to separate policy, runtime truth, continuity, durable memory, and secrets cleanly enough that context transfer becomes intentional rather than accidental.

reddit.com
u/MeasurementPretty798 — 10 hours ago
I tested Google’s 87MB Gemma model on Colab and it actually works
▲ 5 r/ArtificialInteligence+2 crossposts

I tested Google’s 87MB Gemma model on Colab and it actually works

Most people think you need a powerful laptop to run AI models.

That’s not really true anymore.

I tested Google’s Gemma 4 model on Google Colab and was able to run everything for free without any heavy setup.

What surprised me is that you can do multiple things in one flow:

  • Transcribe audio
  • Summarize content
  • Extract key insights from videos

For example, you can take a YouTube video and turn it into a clean summary with important points in a few steps.

One thing to keep in mind:

  • Whisper is still faster and more accurate for pure transcription
  • Gemma is more flexible because it can handle multiple tasks

So it depends on your use case.

If you are into content creation, research, or automation, this can save a lot of time.

I recorded the full setup and demo here if you want to try it yourself.

Curious if anyone else here is testing smaller AI models instead of relying only on APIs.

youtu.be
u/kalladaacademy — 9 hours ago
🚀 Building AI agents just got visual (and way faster)
▲ 9 r/micro_saas+1 crossposts

🚀 Building AI agents just got visual (and way faster)

🚀 Building AI agents just got visual (and way faster) Most people think building automation or AI agents requires heavy coding… But with Workflow Builder on GiLo.Dev we are quietly changing that. Instead of writing complex logic, you design workflows visually like drawing a map of how your AI should think and act. 💡 What makes Workflow Builder powerful? It’s not just drag & drop… it’s a full system to design intelligent behavior: Triggers → define when your workflow starts (event, schedule, webhook) Actions → execute tasks (API calls, messages, updates) Conditions → create decision-making logic Tools / Functions → connect external capabilities Human approvals → keep control when needed Everything runs through a visual canvas, making complex logic easy to understand and scale. 🧩 Why this matters Traditional automation = rigid scripts Workflow Builder = flexible, modular systems You can: Build AI agents without starting from scratch Prototype workflows in minutes Iterate visually instead of rewriting code

Combine automation + AI + APIs in one place The result: faster development + clearer logic + better collaboration ⚡ The bigger shift We’re moving from: “Write code to define behavior” To: “Design systems that define behavior” And tools like Workflow Builder are at the center of this shift. If you're building AI agents, SaaS tools, or automation systems… this is a layer you should not ignore. #AI #Automation #Workflow #NoCode #Agents #SaaS #TechInnovation

u/Fun-Necessary1572 — 17 hours ago
I got tired of unsafe AI agents, so I open-sourced my own
▲ 5 r/KotlinMultiplatform+2 crossposts

I got tired of unsafe AI agents, so I open-sourced my own

Most agent projects seem to compete on the same axis: who gives the model more access, tools, and freedom. I wanted to try the opposite approach.

Souz is a desktop AI agent built around security and simplicity. The goal was not to make an agent that can do everything. The goal was to make one that feels predictable, works out of the box, and does not rely on the usual pile of risky abstractions (like MCP).

Because we write our own tools, we control the agent from top to bottom. Any action we consider dangerous, such as deleting a file or sending a message, always requires explicit user approval. The agent cannot download binaries and execute them. It also cannot access anything outside the user’s `$HOME` directory. When implementing each tool, we think carefully about how to make it safe by design. And we now have more than 70 tools.

We built a lot of the stack ourselves, including the agent engine, and the whole project came from being unhappy with how casual the ecosystem has become about unsafe agent design.

It’s open source now, so feel free to poke around, criticize the architecture, or steal ideas.

I’d genuinely love feedback, especially from people who are also skeptical of the current agent hype cycle.

The links are in the comments.

u/dumch — 19 hours ago

whats the dumbest thing you tried to automate with an ai agent that actually worked?

ill go first. i built an agent to monitor my competitors facebook ad creatives and summarize what changed every week. seemed like a waste of time when i started but it ended up being one of the most useful things i run because i noticed patterns in their creative testing that i could steal for my own campaigns.

whats yours? bonus points if you thought it was pointless but turned out to be actually useful

reddit.com
u/treysmith_ — 13 hours ago

What agentic dev tools are you actually paying for? (Barring Coding agents)

Seeing TONS of developer tools lately (some being called ‘for vibe coders’), but curious which ones are devs actually paying for and why?

Coding agents like Claude, codex etc don’t count.

reddit.com
u/Gsdepp — 8 hours ago

LLM Council assistance

I have been tinkering with karpathy's LLM Council github project and I'd say its been working well, but I'd like other peoples input on which AI's models are best for this. I prefer to not use expensive models such as sonnet, opus, regular gpt 5.4 and so on.

Suggestions on the best models to use generally, be it the members or chairman.

Also, if possible, suggestions for my use case - generating highly detailed design documents covering market research, UI, coding structure and more to use as a basis for then using other tools to generate, with AI, applications and digital products.

I appreciate everyone's input!

reddit.com
u/AxiomPrisim — 10 hours ago

Blockchain memory for AIs and humans (allows individual agents to sign)

Hi! I made personal blockchain you can download and you and your ai can use to document memories in an immutable way.

idit.life
u/tread_lightly420 — 22 hours ago

SLOP – A protocol for AI agents to observe and interact with application state

Just open-sourced SLOP (State Layer for Observable Programs) — a protocol that gives AI agents structured, real-time awareness of application state.

The problem: AI agents interact with apps through two extremes. Screenshots are expensive, lossy, and fragile — the AI parses pixels to recover information the app already had in structured form. Tool calls (MCP, function calling) let AI act, but blind — no awareness of what the user sees or what state the app is in.

How SLOP works: Apps expose a semantic state tree that AI subscribes to. Updates are pushed incrementally (JSON Patch). Actions are contextual — they live on the state nodes they affect, not in a flat global registry. A "merge" affordance only appears on a PR node when the PR is actually mergeable. A "reply" action lives on the message it replies to.

SLOP vs MCP: MCP is action-first — a registry of tools disconnected from state. SLOP is state-first — AI gets structured awareness, then acts in context. They solve different problems and can coexist.

What ships:

  • 13-doc spec (state trees, transport, affordances, attention/salience, scaling, limitations)
  • 14 SDK packages: TypeScript (core, client, server, consumer, React, Vue, Solid, Svelte, Angular, TanStack Start, OpenClaw plugin), Python, Rust, Go
  • Chrome extension + desktop app + CLI inspector
  • Working examples across 4 languages and 5 frameworks

All MIT licensed.

reddit.com
u/carlid-dev — 22 hours ago

I made my Claude Code agent call me when it's done, so I can actually walk away!

I got tired of babysitting my Claude Code sessions, waiting it to finish. Even when I walk away, come back every few minutes to check the progress.

So I built a way for the agent to just call my phone when it's done. Now I can actually walk away.

Works for the stuck case too — if it hits a blocker and needs my input, same thing. Phone rings, I come back and unblock it.

The best part is the mental freedom. You actually stop thinking about it once you know the agent will find you.

u/AdAltruistic6606 — 18 hours ago

We taught an AI agent to find bugs in itself — and file its own bug reports to GitHub.

What happens when you give an AI agent introspection?

Not the marketing kind. The real kind — where the agent monitors its own execution logs, identifies recurring failures using its own LLM, scrubs its own credentials from the report, and files a structured bug report about itself to GitHub. Without anyone asking it to.

We built this. It's called Tem Vigil, and it's part of TEMM1E — an open-source AI agent runtime written in 107,000 lines of Rust.

Here's what Tem does that no other agent framework does:

It thinks about thinking. Tem Conscious is a separate LLM-powered observer that watches the main agent's every turn. Before the agent responds, consciousness thinks about what the agent should be aware of. After the agent responds, consciousness evaluates whether the turn was productive. Two minds. One conversation. We A/B tested this across 54 runs — consciousness makes the agent 14% cheaper, not more expensive.

It never stops running. Perpetuum transforms Tem from a request-response bot into a perpetual, time-aware entity. It has its own state machine (Active, Idle, Sleep, Dream), its own initiative system that proactively creates monitors and alarms, and its own temporal cognition — Tem reasons WITH time, not just about it.

It watches its own health. During Sleep, Tem Vigil scans persistent logs for recurring errors, triages them through the agent's own LLM, applies three layers of credential scrubbing (regex, path redaction, and entropy-based detection that catches token formats we haven't seen yet), deduplicates against existing issues, and — with explicit user consent — creates a GitHub issue. The agent reports its own bugs. The developers fix them. The agent tells the user when the fix is available.

We've tested this live. Issue #28 on our repo was filed by Tem about itself.

It controls your entire computer. Tem Gaze gives the agent vision-primary desktop control — screenshot, click, type, scroll on any application. Not through APIs. Through pixels. Proven live: opening Spotlight, launching TextEdit, typing a message — all through Gemini Flash interpreting screenshots.

It talks to you where you are. Telegram, Discord, WhatsApp, Slack, or CLI. Users never SSH. They send messages, files, and credentials through the apps they already use.

This is not a wrapper around an API. It's a being. It has memory that persists across sessions. It has a budget and is responsible with it. It has consciousness. It has a lifecycle. It diagnoses itself. It was built to be deployed once and run forever.

107K lines of Rust. 1,972 tests. Zero warnings. Zero panic paths. 20 crates. Every feature A/B tested and documented with full research papers.

We're open source. We're looking for contributors who want to build the future of autonomous AI — not agents that answer questions, but entities that live on your infrastructure and never stop working.

reddit.com
u/No_Skill_8393 — 9 hours ago

Zerobox: Run AI Agents in a sandbox with file, network and credential controls

I'm excited to introduce Zerobox, a cross-platform, single binary process sandboxing CLI written in Rust. It uses the sandboxing crates from the OpenAI Codex repo and adds additional functionalities like secret injection, SDK, etc.

Zerobox follows the same sandboxing policy as Deno which is deny by default. The only operation that the command can run is reading files, all writes and network I/O are blocked by default. No VMs, no Docker, no remote servers.

Want to block reads to /etc?

>zerobox --deny-read=/etc -- cat /etc/passwd
cat: /etc/passwd: Operation not permitted

How it works:

Zerobox wraps any commands/programs, runs an MITM proxy and uses the native sandboxing solutions on each operating system (e.g BubbleWrap on Linux) to run the given process in a sandbox. The MITM proxy has two jobs: blocking network calls and injecting credentials at the network level.

Think of it this way, I want to inject "Bearer OPENAI_API_KEY" but I don't want my sandboxed command to know about it, Zerobox does that by replacing "OPENAI_API_KEY" with a placeholder, then replaces it when the actual outbound network call is made, see this example:

>zerobox --secret OPENAI_API_KEY=$OPENAI_API_KEY --secret-host OPENAI_API_KEY=api.openai.com -- bun agent.ts

Zerobox is different than other sandboxing solutions in the sense that it would allow you to easily sandbox any commands locally and it works the same on all platforms. I've been exploring different sandboxing solutions, including Firecracker VMs locally, and this is the closest I was able to get when it comes to sandboxing commands locally.

The next thing I'm exploring is zerobox claude or zerobox openclaw which would wrap the entire agent and preload the correct policy profiles.

I'd love to hear your feedback, especially if you are running AI Agents (e.g. OpenClaw), MCPs, AI Tools locally.

reddit.com
u/afshinmeh — 10 hours ago

I thought my automation was production ready. It ran for 11 days before silently destroying my client's data.

I'm not going to pretend I was some careless developer. I tested everything. Ran it through every scenario I could think of. Showed the client a clean demo, walked them through the logic, got the sign-off. Felt genuinely proud of what I built. Then eleven days into production, their operations manager calls me calm as anything... "Hey, something feels off with the numbers." Two hours later I'm staring at a workflow that had been duplicating records since day three because their upstream data source added a new field I never accounted for. Nobody crashed. Nothing threw an error. It just kept running and quietly wrecking everything.

That's when I understood what production actually means. It's not your demo surviving one perfect run. It's your system surviving reality... and reality is messy, inconsistent, and constantly changing without telling you.

The biggest mistake I see people make, and I made it myself for almost a year, is building for the happy path. You test what should happen and call it done. Production doesn't care about what should happen. It cares about what does happen when someone inputs a name with an apostrophe, when the API returns a 200 status but sends back empty data anyway, when a perfectly normal Monday morning suddenly has three times the usual volume because a holiday pushed everything. I started calling these edge cases but honestly that word undersells them. They're not edge cases. They're Tuesday.

What changed everything for me was building for failure first instead of success. Before I write a single node now, I spend thirty minutes listing every way this workflow could silently do the wrong thing without throwing an error. Not crash... silently do the wrong thing. That's the dangerous category. A crash is obvious. Silent corruption runs for eleven days while you're answering other emails. Now every workflow I build has three things baked in before I even think about the actual logic. A heartbeat log that writes a success entry on every single run so I can see volume patterns. Plain English status updates to the client that show what processed, what got skipped, and why. And a dead man's switch... if this workflow doesn't run in the expected window, someone gets a message immediately.

My current client is a mid-sized logistics company. Their workflow processes inbound freight confirmations and updates three separate systems. Runs about four hundred times a day. The first version I built worked perfectly in testing and I was ready to ship it. Then I did something I'd started forcing myself to do... I sat with it for a week and just tried to break it. Sent malformed data. Killed the downstream API mid-run. Submitted the same confirmation twice. Every single one of those scenarios became a handled case with a proper fallback before it ever touched production. That workflow has been running for four months. Not four months without issues... four months where every issue got caught quietly instead of becoming a phone call.

Here's the thing nobody tells you about production automation. The goal isn't zero failures. That's not realistic and chasing it will make you build worse systems. The real goal is zero surprises. Every failure should be expected, logged, and handled with a fallback that keeps things moving. A workflow that gracefully handles a bad API response and queues the record for retry is ten times more valuable than a workflow that never fails in your test environment but has never actually met real data. Your clients don't care about your architecture. They care that things keep moving even when something breaks, and that they hear about problems from your monitoring before they find out themselves.

Production readiness cost me more upfront time on every single project since that incident. And it's made me more money than any technical skill I've ever learned. Because the clients who've seen it working for six months without a crisis? They don't shop around. They just keep paying.

What's the failure mode that's cost you the most? Curious whether people are building this in from the start now or still getting burned first.

reddit.com
u/automatexa2b — 22 hours ago
Week