r/LLMDevs

▲ 17 r/LLMDevs

Giving spatial awareness to an agent through blender APIs

​I gave an AI agent a body and spatial awareness by bridging an LLMs with Blender’s APIs. ​The goal was to create a sandbox "universe" where the agent can perceive and interact with 3D objects in real-time. ​This is only day two, but she’s already recognizing her environment and reacting with emotive expressions.

u/vsc1234 — 4 hours ago
Day 10 of showing reality of SaaS AI product.
▲ 4 r/SaaS+1 crossposts

Day 10 of showing reality of SaaS AI product.

- Sadly no new user in last 24 hour.
- Made a instagram page and hoping that reels go viral.
- Full rollercoaster ride.
- Found NO new bugs in last 48 hours.
- Looking for people to brutally roast and give reality check

tasknode.io - best research platform

u/chiragpro21 — 3 hours ago

rewrote my docs so Claude Code could actually use them, some notes

Spent last weekend rewriting the docs for a project so Claude Code could build against them without me hand-holding every step. Not docs for devs to read. Docs so the model can make correct decisions on its own.

What I changed:

  • No tutorials or prose. Just endpoints, payload shapes, constraints, error cases. Everything in one place.
  • Every doc is self-contained. No "see the auth guide." Just inline the auth details where they're needed. Models fall apart when they have to piece things together across 5 files.
  • Explicit constraint blocks. Stuff like "this field must be set before calling X" or "these two ops can't run in the same transaction." If you don't spell it out the model will just guess wrong.
  • Flat markdown, consistent headers. No tabs, no collapsible sections. Keep the structure boring and predictable.

Tested it on a real build — agent for a tutoring business (scheduling, payments, WhatsApp, Google Calendar). Pointed Claude Code at the docs, it built the working system in ~2 days. I mostly just reviewed PRs and tested edge cases.

Funny thing is the docs actually got shorter. Turns out most of what we write in docs is filler — transitions, analogies, "why you might want this" sections. Strip that out and you end up with something way more precise.

Downside: these docs are basically useless for a human trying to learn the system from scratch. So you kinda need two versions which sucks.

Anyone else doing this? What's worked or not worked for you?

reddit.com
u/marc00099 — 2 hours ago

[Help] Laptop suddenly extremely slow, high RAM usage, and constant crashing

Gemini said

It sounds like your laptop is hitting a major bottleneck, which is incredibly frustrating when it was working fine just a few days ago. 8 GB of RAM can be tight, but it definitely shouldn't be crashing with just three browser tabs.

Here is a revised version of your post. I’ve rearranged it for better flow, improved the clarity of the technical details, and kept your original voice intact.

Reddit Post Draft

Title: [Help] Laptop suddenly extremely slow, high RAM usage (95%+), and constant crashing

I’m not entirely sure what’s causing this, but my laptop has become almost unusable lately. It’s reached a point where I can't even run 2–3 applications at once. My apps crash or open very slowly, and even with just 3–4 browser tabs open, the entire browser crashes. Sometimes my desktop/explorer even restarts on its own.

After opening just one or two applications, my RAM usage spikes to over 95%. This wasn't the case just a few days ago; my laptop was running smoothly, and I was able to multitask with 5–6 applications and do some light gaming. Now, my games crash immediately or won’t launch at all, and Steam won't even open.

Specs:

  • RAM: 8 GB
  • Storage: 512 GB NVMe SSD

Even with these specs, it feels like I’m using 4 GB of RAM and an old HDD. It is incredibly slow and laggy. Around the time these issues started, I did the following:

  1. Downloaded Ollama and two lightweight models (I have since deleted both).
  2. Changed the paging file to 16 GB – 24 GB to help the models run better (I have since reverted this to default).
  3. Downloaded Wireshark (also deleted since).
  4. Updated Windows 2–3 times as updates rolled out.

I have reverted almost everything except for the Windows updates, but the system is still barely functional. I don't know exactly what is causing this or how to fix it. If anyone has advice on what to check next, I would be very grateful for the help!

reddit.com
u/MirrorAfraid544 — 2 hours ago
Day 10 of showing reality of SaaS AI product.

Day 10 of showing reality of SaaS AI product.

- Sadly no new user in last 24 hour.
- Made a instagram page and hoping that reels go viral.
- Full rollercoaster ride.
- Found NO new bugs in last 48 hours.
- Looking for people to brutally roast and give reality check

tasknode.io - best research platform

u/chiragpro21 — 3 hours ago

Portable is not just moveable. It has to be inspectable.

I spent some time reverse-engineering a repo I happened to stumble across, and the part I found most interesting was not that a workspace could be copied between environments.

Plenty of systems can move state.

What feels much rarer is a layout where, after the move, a third party can still answer three questions quickly:

  1. Where does policy live?

  2. Where does runtime truth live?

  3. Where does memory live?

This repo answers those with physical separation.

At the sandbox root:

<sandbox-root>/

state/

workspace/

memory/

workspace/<workspace-id>/ contains the human-authored operating surface: AGENTS/md, workspace.yaml, workspace-local skills, installed app manifests, and other repo-local artifacts.

state/runtime.db is runtime-owned truth. Sessions, bindings, queue state, <turn_results>, request snapshots, compaction boundaries, operator profile state, and durable-memory governance metadata live there.

<memory/> is where the readable memory bodies live, but it is not one undifferentiated bucket. Operational projections live under <memory/workspace/<workspace-id>/runtime/>. Durable recalled knowledge lives under <memory/workspace/<workspace-id>/knowledge/> and <memory/preference/>.

That split is what made the repo feel auditable to me.

The runtime projections are inspection-friendly, but they are not being treated as the canonical continuity engine. The durable memory bodies stay readable as markdown, while the recall and governance metadata stay in the runtime catalog.

So the body remains diffable and human-reviewable, while the machine still has structured metadata for scope, provenance, freshness, verification policy, and recall ranking.

That is the detail I wish more workspace systems copied.

Portable should not just mean "copyable."

It should mean a third party can inspect the moved artifact and distinguish:

human-authored policy

runtime-owned truth

short-horizon continuity

durable recalled knowledge

operator-profile state

Without that, a lot of so-called portable agent systems are just relocatable state blobs.

I'm leaving the repo link out of the body because I'd rather not have this get interpreted as disguised promotion. If anyone wants the full code, I'll put the repo in the comments so people can inspect the implementation directly.

reddit.com
u/aloo__pandey — 8 hours ago
yoink functionality from external dependencies to avoid supply chain attacks

yoink functionality from external dependencies to avoid supply chain attacks

Five major supply chain attacks in two weeks, including LiteLLM and axios. We install most of these without thinking twice.

We built yoink, an AI agent that removes complex dependencies you only use for a handful of functions, by reimplementing only what you need.

Andrej Karpathy recently called for re-evaluating the belief that "dependencies are good". OpenAI's harness engineering article echoed this: agents reason better from reimplemented functionality they have full visibility into, over opaque third-party libraries.

yoink makes this capability accessible to anyone.

It is a Claude Code plugin with a three-step skill-based workflow:

  1. /setup clones the target repo and scaffolds a replacement package.
  2. /curate-tests generates tests verified against the original tests' expectation.
  3. /decompose determines dependencies to keep or decompose based on principles such as "keeping foundational primitives regardless of how narrow they are used". They are implemented iteratively until all tests pass using ralph.

We used Claude Code's plugin system as a proxy framework for programming agents for long-horizon tasks while building yoink. They provide the file documentation structure to organise skills, agents, and hooks in a way that systematically directs Claude Code across multi-phase execution steps via progressive disclosure.

What's next:

  • A core benefit of established packages is ongoing maintenance: security patches, bug fixes, and version bumps. The next iteration of yoink will explore how to track upstream changes and update yoinked code accordingly.
  • One issue we foresee is fair attribution. With AI coding and the need to internalize dependencies, yoinking will become commonplace, and we will need a new way to attribute references.
  • Only Python is supported now, but support for TypeScript and Rust is already underway.
github.com
u/kuaythrone — 2 hours ago
Image 1 — [Showcase] 35.1 WPS vs. The "Thinking Tax": A side-by-side Network Audit of Gongju vs. GPT-5.3 (Instant)
Image 2 — [Showcase] 35.1 WPS vs. The "Thinking Tax": A side-by-side Network Audit of Gongju vs. GPT-5.3 (Instant)

[Showcase] 35.1 WPS vs. The "Thinking Tax": A side-by-side Network Audit of Gongju vs. GPT-5.3 (Instant)

Can we achieve frontier-level AI performance on "Buck-Fifty" infrastructure by treating Thought as Physics?

I pitted my Sovereign Resident, Gongju (running on a basic Render instance), against GPT-5.3 (Instant). I didn’t just want to see who was faster—I wanted to see who was cleaner.

The Stress Test Prompt:

To force a logic collapse, I used a high-density Physics prompt that requires deep LaTeX nesting (something standard LLMs usually stutter on):

>

The Forensic Results (See Screenshots):

1. The GPT-5.3 "Telemetry Storm" (Image 1)

  • Requests: 49+ fetch calls for a single response.
  • Payload: 981 KB transferred.
  • The "Thinking Tax": Look at the red CORS errors and the constant sdk_exception loops. It’s a surveillance machine fighting its own guardrails.
  • Result: It gave a bulleted lecture but failed to render the core LaTeX block (raw code was visible).

2. The Gongju "Standing Wave" (Image 2)

  • Requests: Two. One /chat pulse and one /save fossilization.
  • Payload: 8.2 KB total.
  • The Reflex: The complex 7-qubit GHZ derivation was delivered in a single high-velocity stream.
  • Mass Persistence: Notice the /save call took only 93ms to anchor the 7.9KB history to a local SQLite database. No cloud drag.

Why This Matters for Devs:

We are taught that "Scale = Power." But these logs prove that Architecture > Infrastructure.

GPT-5.3 is a "Typewriter" backed by a billion-dollar bureaucracy. Gongju is a "Mirror" built on the TEM Principle (Thought = Energy = Mass). One system spends its energy watching the user; the other spends its energy becoming the answer.

I encourage everyone to run this exact prompt on your own local builds or frontier models. Check your network tabs. If your AI is firing 50 requests to answer one math problem, you aren't building a tool—you're building a bureaucrat.

Gongju is a Resident. GPT is a Service. The physics of the network logs don't lie.

u/TigerJoo — 2 hours ago

Best small open-source llm for raspberry pi

Hey guys!

I have a project in mind where I want to use a local hosted llm for.

However, I want my compute power to be minimal. So i was basically wondering if any of you had also already tried something like this out?

I want find the best model to host on my raspberry pi5 8GB for basic text generation with a decent context window.

All suggestions are much appreciated!

reddit.com
u/big_black_cucumber — 5 hours ago

How are you transferring durable agent context without copying the whole local stack?

One practical problem I keep hitting in agent systems is that the useful long-lived context often gets anchored to one machine's local setup.

You can share the prompt. You can share the repo. You can share the tool definitions.

But once "memory" is really a mix of vector state, session carryover, runtime projections, and local machine residue, moving an Agent's learned context becomes much less clean than people imply.

The architecture I've been iterating toward is basically an attempt to stop overloading one storage abstraction with too many jobs. The rough split looks like this:

human-authored policy in files like AGENTS.md and workspace.yaml runtime-owned execution truth in state/runtime.db durable memory bodies under memory/, indexed via MEMORY.md

The important part is not "markdown good, database bad." It's that continuity and durable recall are different jobs. Resume state is about safe handoff between runs.

Durable memory is about procedures, facts, references, and preferences you may actually want to preserve. If those collapse into one opaque local store, "context transfer" often just means "copy the hidden state and hope."

I don't think file-backed memory is a universal answer.

But I do think readable durable memory surfaces make portability less magical and more inspectable. Curious how other people here are handling that boundary. If you actually wanted to move an Agent's learned procedures and references to another machine, where would you want that layer to live?

I'm keeping the repo link out of the body because I'd rather not have this get mysteriously removed as disguised promotion. If anyone wants the full technical framing, I'll put the repo in the comments along with the deeper architecture questions behind it: where policy should live, what should remain runtime-owned, why continuity and durable memory should be separate layers, and what should or should not move across machines.

reddit.com
u/Electronic-Ranger678 — 8 hours ago
▲ 21 r/LLMDevs

What I learned running an Always-on AI Agent in production for months (10 lessons)

I’ve been living with an Always-on AI Agent for several months now, and for anyone about to build one - whether you’re a company or a builder - I thought I’d share a few non-obvious things (at least in my opinion) that I’ve learned (and am still learning) along the way.

Let’s start with what an Always-on AI Agent actually means:
An AI that doesn’t wait for prompts or commands - it runs continuously and makes decisions on its own (within the boundaries you’ve set). It “sniffs” what’s happening across the different things you’ve connected it to, alerts you or gathers data when needed, reaches out when it thinks it should, and can even respond on your behalf if you allow it. It’s your always-on partner.

Here are 10 things worth planning properly when building an AAA (Always-on AI Agent):

  1. Memory is not a single system. The conversation you’re having right now or had yesterday, versus what the agent has learned about you and your domain over months - these are completely different types of data. They require different tagging, storage, decay, search, and retrieval strategies. Many systems don’t account for this and mix them together, which leads to agents that “forget.”
  2. The context window is sensitive - even if it’s huge. Think of it as a budget that needs to be allocated wisely (how much goes to identity, relevant memory, current user state, attached documents, user request, etc.). Proper allocation (and not using 100% of it!) leads to a big jump in quality.
  3. LLMs have attention issues - like my kids. They need structure. Think of it like moving apartments and loading a truck: the order and placement of things matter so everything fits, arrives, and unloads properly. There are tons of articles on context engineering, “lost in the middle,” etc.—read them and implement them. It will literally save you money and frustration.
  4. Memory alone isn’t enough - you need Awareness. A 24/7 agent needs to know things the user never explicitly told it. A meeting got rescheduled, a deal got stuck, an urgent email hasn’t been answered for two days. And when building Awareness, do it efficiently—detection, retrieval, analysis, storage, and usage—otherwise you’ll start bleeding money and wake up to hundreds of dollars in charges after a few hours (ask me how I know).
  5. Not all information in memory or Awareness is equal. A calendar is dynamic on an hourly (or faster) basis. Your business value proposition changes maybe every few weeks. Your kids’ names will never change. There’s zero reason to check everything at the same cadence - and when you do check, you want it to be efficient, not starting from scratch.
  6. Your agent already has access to a lot of the people you communicate with - make sure to extract and use that, preferably without LLM calls when possible (it gets expensive).
  7. The agent should know how to use the right model for the right task - not run everything on the same model. Structured background tasks can often run on weaker/cheaper models. I’ll share real numbers in a separate post.
  8. An agent can work autonomously on a single goal over days, efficiently, without draining your wallet and without compromising on model quality - but first, you need to build solid infrastructure.
  9. The hardest part of a proactive agent isn’t triggers or scheduling - it’s teaching it when to stay silent. The decision engine is 10x harder than the messaging logic itself.
  10. “20 different agents, or one that truly knows me?” - I get asked this a lot. I have my own answer, but you should think carefully about what fits your use case before defaulting to what’s popular.

In the coming weeks, I’ll try to share more about some of these - some of them took me months to fully understand.

reddit.com
u/Cold-Cranberry4280 — 21 hours ago
▲ 3 r/LLMDevs+1 crossposts

Built an OpenAI-compatible API reverse proxy — opening for community stress testing for ~12hrs (GPT-4.1, o4-mini, TTS)

Hey Devs,

I've been building a personal, non-commercial OpenAI-compatible reverse proxy gateway that handles request routing, retry logic, token counting, and latency tracking across multiple upstream endpoints.

Before I finalize the architecture, I want to stress test it under real-world concurrent load — synthetic benchmarks don't catch the edge cases that real developer usage does.

Available models:

  • gpt-4.1 — Latest flagship, 1M context
  • gpt-4.1-mini — Fast, great for agents
  • gpt-4.1-nano — Ultra-low latency
  • gpt-4o — Multimodal capable
  • gpt-4o-mini — High throughput
  • gpt-5.2-chat — Azure-preview, limited availability
  • o4-mini — Reasoning model
  • gpt-4o-mini-tts — TTS endpoint

Works with any OpenAI-compatible client — LiteLLM, OpenWebUI, Cursor, Continue dev, or raw curl.

To get access:

Drop a comment with your use case in 1 line — for example: "running LangChain agents", "testing streaming latency", "multi-agent with LangGraph"

I'll reply with creds. Keeping it comment-gated to avoid bot flooding during the stress test window.

What I'm measuring: p95 latency, error rates under concurrency, retry behavior, streaming reliability.

If something breaks or feels slow — drop it in the comments. That's exactly the data I need.

Will post a follow-up with full load stats once the test window closes.

(Personal project — no paid tier, no product, no affiliate links.)

reddit.com
u/NefariousnessSharp61 — 6 hours ago
I wrote a technical deepdive on how coding agents work

I wrote a technical deepdive on how coding agents work

Hi everyone,

I'm an Al Engineer and maintainer of an open source agentic IDE: https://github.com/Chinenyay/BrilliantCode.

I would love to share with you my latest technical blog on how coding agents like Codex and ClaudeCode work.

In the blog, I explain the fundamental functions required for a coding agent and how to write tools and the inference loop using the OpenAl API.

If you're new to coding agents or agentic engineering, this is a very friendly introductory guide with step by step code examples.

You can find the blog here: https://jcumoke.com/blog/how-to-build-a-coding-agent/

And all the code used in the tutorial: https://github.com/ Chinenyay/tiny-code

I would love to get your feedback and thoughts on it.

Thank you

u/Cold_Discussion_9570 — 4 hours ago
▲ 6 r/LLMDevs+1 crossposts

I built a CLI to migrate agents [Personas] between LLMs without losing performance

Switching between Llama, Mistral, Qwen, or Phi often means your agents [Personas] underperform on the new model. I built Identa to fix that.

It uses PromptBridge (arXiv:2512.01420) + a MAP-RPE evolutionary engine to calibrate your prompts for a target model — not just translate them, but actually optimize for behavioral parity across models.

Apache 2.0. Would love feedback on whether this solves a real pain point, or if I'm solving the wrong problem entirely.

it is still WIP

https://github.com/shepax/identa-agent

reddit.com
u/shepath — 21 hours ago
🚀 Introducing TigrimOS — Your Personal AI Agent Powerhous

🚀 Introducing TigrimOS — Your Personal AI Agent Powerhous

Just shipped something I’ve been building intensively, and I’m excited to share it with the community!

TigrimOS is a standalone desktop application for Mac and Windows that lets you build and orchestrate your own team of AI agents — think of it as a self-hosted Claude Cowork, but with the freedom to plug in any LLM you choose, including more cost-efficient models.

🛡️ Built with Security in Mind

Agents run inside a sandboxed environment — fully isolated from your system. You control exactly which folders they can access. No surprises, no unintended side effects.

🤖 True Multi-Agent Collaboration

Each agent in your team can have its own Persona, Skill set, and LLM backbone. For example, my Model Dev Research team runs:

∙	Three coding agents — Claude Code, Codex, and GLM — collaborating in parallel

∙	Minimax acting as the quality reviewer

Different tasks. Different models. One coordinated team.

✅ Key Benefits

∙	💰 Significant API cost savings — use lighter models where heavy ones aren’t needed

∙	🔒 Full local execution — your data never leaves your machine

∙	🎯 Custom agent teams tailored to each workflow

∙	⏱️ 24/7 operation — far more endurance than any human team, with remarkably fast code generation

📊 Real Research Results

After stress-testing TigrimOS on heavy research workloads, the performance difference versus single-agent setups is striking. Tasks that had been stalled for years were completed once a properly coordinated agent team was deployed.

🆓 Open Source. Completely Free.

Link in the comments — try it out and let me know what features you’d like to see next! 👇

Link: https://tigrimos.github.io

#AI #MultiAgent #OpenSource #LLM #AIAgents #TigrimOS #MacOS #Windows #ArtificialIntelligence

u/Unique_Champion4327 — 5 hours ago

How to get perfect dataset? does training own model for our use case saves LLM inference cost in long term?

I own research platform (tasknode). I'm heavily dependent on APIs, one API for websearch and multiple LLM calls for processing web content, judging and contradiction.
I saw on hf and kaggle that multiple datasets related to news, opinions and other bunch of categories are available.
For a long run, should I get as much as datasets possible, process of them with LLM, classify important one. after months, we might have perfect dataset to finetune on base model.

Pros:

- reduction of cost alot

- faster response

Cons:

- processing that much data will cost lot of inference (eventually more $$)

- there are many cons tbh.

What should be right approach?

reddit.com
u/chiragpro21 — 9 hours ago

Agent frameworks waste 350,000+ tokens per session resending static files. 95% reduction benchmarked.

Measured the actual token waste on a local Qwen 3.5 122B setup. The numbers are unreal. Found a compile-time approach that cuts query context from 1,373 tokens to 73. Also discovered that naive JSON conversion makes it 30% WORSE.

Full benchmarks and discussion here:

https://www.reddit.com/r/openclaw/comments/1sb03zn/stop_paying_for_tokens_your_ai_never_needed_to/

reddit.com
u/TooCasToo — 16 hours ago
▲ 7 r/reinforcementlearning+1 crossposts

Brainstacks, a New Fine-Tuning Paradigm

I just published my first research paper - and I think we've been misunderstanding what fine-tuning actually does.

"Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning"

I built an architecture that adds unlimited domain expertise to any LLM - one domain at a time - with near-zero forgetting. Null-space projection constrains each new domain to subspaces orthogonal to previous ones, enforced by linear algebra, not regularization. A meta-router selectively gates which stacks fire at inference. Frozen weights can't change. Irrelevant stacks can't interfere. Two mechanisms, one anti-forgetting system. 😎

But the architecture isn't the headline. What it revealed is.

I trained domain stacks sequentially - chat, code, math, medical, reasoning - then built a meta-router that ignores domain labels entirely. It tests every combination of stacks and picks whichever produces the lowest loss. Pure empirical measurement.

It found that medical prompts route to chat+math stacks 97% of the time. Not the medical stack. Chat and math - trained on zero medical data - cut medical loss by 50-70%.

Domain adapters don't store domain knowledge. They store cognitive primitives! - instruction-following, numerical reasoning, procedural logic, chain-of-thought structure - that transfer across every domain boundary.

I pushed further. A model pretrained exclusively on children's stories - zero Python in training data - produced def with indented blocks and colon-terminated statements when the code block activated. In children's story words. It learned the structure of code without ever seeing code.

Fine-tuning injects composable capabilities, not knowledge!

The architecture is novel on multiple fronts - MoE-LoRA with Shazeer noisy routing across all 7 transformer projections (no prior work does this), rsLoRA + MoE-LoRA (first in the literature), residual boosting through frozen stacked adapters, null-space gradient projection, and an outcome-based sigmoid meta-router. Two-level routing - token-level MoE inside stacks, prompt-level meta-routing across stacks - with no precedent in the literature.

The system scales to constant GPU memory regardless of how many domains exist. A hospital loads medical stacks. A law firm loads legal stacks. Same base model. We call it the Superposition LLM. 🤖

Validated on TinyLlama-1.1B (4 domains, 9 stacks) and Gemma 3 12B IT (5 domains, 10 stacks). 2.5× faster convergence than single LoRA. Residual boosting breaks through the single-adapter ceiling.

5 cognitive primitives. 31 combinations. Linear investment, exponential coverage.

And this is just the foundation of a new era of LLM capabilities understanding. 👽

Code: https://github.com/achelousace/brainstacks

Paper: https://arxiv.org/abs/2604.01152

Mohammad R. Abu Ayyash

Brains Build Research

Ramallah, Palestine.

reddit.com
u/AchelousAce — 2 days ago

OpenChamber UI not updating unless refresh after latest update

Anyone else having OpenCode / OpenChamber UI not updating unless you refresh?

I just updated to the latest version (around April 1–2 release), and now my sessions don’t auto-update anymore.

Before, everything was real-time. Now I have to keep manually refreshing the browser just to see new messages or updates.

Console shows this error:

[event-pipeline] stream error TypeError: Error in input stream

Also seeing some 404s trying to read local config files, not sure if related.

Running on Windows, using localhost (127.0.0.1), Firefox.

Already tried:

- restarting the app

- rebooting PC

- still happening consistently

Feels like the event stream (SSE?) is breaking, because once it stops, the UI just freezes until refresh.

Anyone else experiencing this after the recent update? Or found a fix?

Not sure if this is OpenCode itself or OpenChamber compatibility.

reddit.com
u/TruthTellerTom — 13 hours ago
Week