r/AIAssisted

▲ 395 r/AIAssisted+6 crossposts

Priorities: Making AI Powerful > Making AI Safe

How I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway

Long-time lurker first time posting. Hey everyone!

So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever.

This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process.

That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one.

I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into:

- Tool names sit in the model context, so the model can guess or forge them

- "Dangerous mode" is one config flag away from default

- Memory management has no concept of instruction priority

- The audit story is mostly "the model thought it should"

I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one.

So I made it myself.

CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis:

The LLM never holds the security boundary.

What that means in code:

Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names.

Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass.

IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them.

Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too.

Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client.

No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded.

The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is.

I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries.

I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-locked), web fetch with SSRF protection, browser, PDF extraction, persistent memory, scheduler. All effect-classified, all dry-run-supported, all audit-logged.

Finally, I created a single-file Windows installer so you can literally download, set up, and use in like, five minutes. PySide6 wizard handles Node install, config generation, the works. End user needs nothing preinstalled. Linux/WSL is two-terminal manual right now; that's a v0.1.1 cleanup.

CrabMeat was built with Claude Code. I want to be specific about that because "I used an AI" is a meaningless statement and "Claude Code wrote my project" is usually a lie. What's actually true is that this project would not exist in this shape, on this timeline, without a workflow built around Claude Code as a core tool, and I think the workflow is worth describing, because it really pushes away from the idea of 'I just told it to build the thing and it did'. It was genuine work to get it finished.

The core loop I landed on uses Claude Code for architectural work and patching, and separate models (Codex / DeepSeek) for adversarial red-teaming and audits against the same codebase. Claude Code is good at building correctly. A different model under different prompting is better at attacking what was built (Codex specifically was REALLY good at this). Running them against each other on every security-relevant subsystem found three critical silent-failure bugs in an earlier project of mine (SIGIL) that I never would have caught with one model alone and that pattern became the audit playbook I used for CrabMeat's security surface. The bugs Claude Code patched, Codex tried to break, Claude Code patched again, repeat until clean.

I keep a single global instruction file (CLAUDE.md) that defines how Claude Code interacts with my projects, code style, commit message conventions, what counts as "done," when to ask before acting. This file is the closest thing I have to a senior-engineer voice in the room. It catches a lot of "you didn't ask if I wanted this" moments before they happen and it saves me literally millions of tokens of reiterations, debugging, hallucenations, and confusion.

I built up roughly 21 reusable Claude Code skills over the course of CrabMeat and adjacent projects. None of these are taken from anywhere else. They're specific to my own workflow, not something generic. "Run the security audit playbook." "Generate a release changelog from git log." "Verify a published release against its tag." The skills are what turn one-off prompts into a real pipeline. As an aside, this was a formalization of a method I had been using for awhile, realizing it was 'official' now let me dump everything into an official channel. Absolute perfection. *chefs kiss*

Parallel Claude Code instances ran on independent subsystems. For the heavy work, I ran multiple Claude Code instances overnight against different parts of the codebase, one on the email connector, one on the audit chain, one on the launcher build, etc. This is only safe because each subsystem has clear boundaries and its own test surface, and because the audit chain catches drift between them. It never edits or changes anything in the codebase, only audits and then writes me a detailed report in markdown. Every security-relevant PR goes through a deliberate "now break this" pass before it lands. Sometimes that's me, sometimes that's a fresh Claude Code instance with adversarial prompting, sometimes that's Codex. The point is the pass exists and it's structured, not vibes. None of this is vibes. Everything is deliberate.

What Claude Code didn't do: it didn't want the program to exist, it didn't design the architecture, it didn't make the security decisions, it didn't make decisions for me, and it didn't write the threat model. The thesis... the "LLM never holds the security boundary"... is mine, and Claude Code's job was to help me implement it cleanly and catch my own mistakes. Which, let's be honest, are a lot. The relationship that works for me is "Claude Code is a very capable engineer on my team who needs clear specs and code review." The relationship that doesn't work is "Claude Code is a magic project generator." If you treat it as the second, you ship something that looks finished but isn't. It absolutely is not that and when I stop LEARNING from using it, I might as well stop using it entirely.

The honest take: I write better code with Claude Code in the loop than without. Specifically, I write more thorough code. Better tested, better commented, more defensively structured. Because the cost of doing it RIGHT dropped and the cost of skipping it stayed the same. That's the productivity gain. I don't think it makes me "10x faster," it is how I actually finish the boring 30% that I used to skip. If you're using Claude Code for serious projects and not already doing the adversarial-second-model thing, try it. It's the single highest-leverage change I've made to my workflow this year.

This is v0.1.0 and calling it 1.0 would be a lie. The README has an honest four-tier stability table: "Stable, beta, experimental, not-recommended-for-network-exposed." The core loop and security rails are stable. Some subsystems are beta. A few are experimental. No part of it is 1.0-mature and I'm not going to pretend it is.

It has not been formally audited. I'd love red team reports. SECURITY.md has a coordinated disclosure path. This is a passion project. I'd rather have ten people running it carefully than ten thousand running it like OpenClaw got run.

The repo: https://github.com/mr-gl00m/crabmeat

Happy to answer questions, hear what I got wrong, or get torn apart in the comments. This is the first time most of this work has been seen outside my own machine and I'd rather find the holes now than later.

— Cid

u/RestingFrames — 2 hours ago

🔥 Hot ▲ 6.3k r/AIAssisted+5 crossposts

Humanity's greatest hits: things we actually paused

u/KeanuRave100 — 16 hours ago

▲ 192 r/AIAssisted+5 crossposts

u/KeanuRave100 — 19 hours ago

▲ 2 r/AIAssisted

Help?! Im newish

So im newish to AI. Im 32m im not 100% unfamiliar with computers and stuff, but my level of knowledge doesnt go past editing javascript d2 kol bot years ago on d2 1.14. I honestly do not know how to ask for help, because I am a bit lost on what I am doing. I have been using AI to help me learn a bunch of different topics.

It started with a timeline of my life, and lead to me using 3 llms to help me code.

I DO NOT CLAIM

To know how to code, to be good at anything, to know anything more than someone else, or to be smart.

I AM HOWEVER- Not an excuse

Doing 90% from phone while working and at home as I only have one mac, and its my gf who works from home, so I can never use it.

Learning how to read, write, and the most important, THE VOCABULARY. I have strong pattern rec skills, but I KNOW that will only get me so far and I feel like I am hitting a wall.

I wrote an AI summary in another place and was attacked immediately for my lack of knowledge, though I try to make it clear, I AM ONLY 30 DAYS IN AND HAVE NO FREAKING CLUE WHAT IM DOING MORE THAN TRIAL AND ERROR ON MY OWN.

So my question is this.

IF anyone would be willing to take the time to look at anything ive done or let me explain more of what I am doing.

Its a mix of prompting, AI generated code, agents, monolithic Python- the first thing i am working on and trying to understand before branching into ANY other topics on code, SDD, documentation, and also learning all the tools and things parallel. I used AI to google 30 days ago. Please understand, I am learning.

reddit.com

u/lostsoulfs — 11 hours ago

▲ 19 r/AIAssisted+13 crossposts

How do you actually test a voice AI agent without calling it yourself every time?

So we've been working on a voice bot that handles customer calls and honestly the testing part has been brutal. We were literally calling the thing ourselves to check if it broke after every change.

Eventually we just wrote a framework that synthesizes fake caller audio, pipes it into the agent, and checks if the response is sane — latency, hallucinations, whether it handles interruptions, etc. Runs locally against a SQLite db, no cloud stuff.

It connects over websockets, can mock twilio streams, works with elevenlabs and vapi agents too. You can also plug in ollama as the judge so the whole thing runs offline.

We open sourced it: https://github.com/unforkopensource-org/decibench

Curious how others here handle this. Are you just vibing and hoping production doesn't break or is there a better workflow I'm missing?

u/Tricky_School_4613 — 13 hours ago

▲ 2 r/AIAssisted+2 crossposts

An Auditing Protocol for Human-AI Sessions: Free HTML Test to Measure Clarity, Coherence, Emphasis, and More

Sharing a protocol I developed for auditing co-creation sessions with language models (LLMs). It's a single HTML form, no external dependencies, designed to evaluate both model performance and user experience.

Why this might be relevant

In long interactions, conversation quality tends to fluctuate. Sometimes the model loses the thread, shifts its tone, or drifts from the initial goal, and it's not always clear whether it's a technical failure or an effect of the session dynamics. This test offers a systematic way to track it.

What it measures

· Model (3C+1E): Clarity, Compactness, Coherence, and Emphasis (fidelity to the goal declared at the start of the session).

· User (SSJ): Speed (whether the session flows or stalls), Struggle (cognitive cost), and Joy (whether the interaction feels rewarding).

· Conversational ruptures: where and why the interaction broke, and how (or if) it recovered.

· Regulatory checks: flags potential violations of the EU AI Act's Article 5 (manipulative techniques, exploitation of vulnerability) and cross-platform contamination.

An unexpected finding

In tests with three different models performing the same task (translating an essay into native English), the data showed that:

· The Joy metric stayed at 0 in all cases, even when the technical outputs were solid.

· The main source of drift was cross-contamination: feeding one model's outputs into another destabilised the sessions.

· The model that received the most initial trust (and thus the heaviest workload) scored the worst — a bias the test helps identify.

The deferred phase

The protocol includes an optional phase 24 hours later: the results are shared with the model and analysed together. This second look often reveals patterns that went unnoticed in the heat of the session.

In summary

· Compatible with any LLM (local or API).

· Quick to complete (5–10 minutes after a session).

· Exports data as JSON for longitudinal tracking.

· Licensed CC BY 4.0, completely free.

Link to the test: https://doi.org/10.6084/m9.figshare.32320875

The file includes the HTML form and a User Guide. This is a Beta version (v3); feedback is welcome from anyone who works intensively with LLMs and wants to try it under real condition

u/Fluid-Pattern2521 — 9 hours ago

▲ 340 r/AIAssisted+20 crossposts

AI will deduce ethics from first principles

u/KeanuRave100 — 1 day ago

▲ 211 r/AIAssisted+3 crossposts

AI risk bell curve

u/KeanuRave100 — 1 day ago

▲ 2 r/AIAssisted

Runway quietly removing unlimited and calling it an "upgrade"

they started to cancel unlimited for max and called it an "upgrade"

I've been on runway's unlimited plan, built my whole workflow around it. relaxed(aka slow) mode, sure, staying on slow mode but unlimited generations meant I could actually experiment without watching a credit counter.

woke up today and everything is MAX plan now.

the thing that gets me is HOW they did it. not a single email saying "hey, we are changing to max plan." just gone. and when you go looking for answers, the support loop sends you to discord where half the responses are condescending at best.

i get that companies change pricing. i really do. but there's a little difference between adjusting your business model & communicating it and just restructuring it overnight.

now i'm sitting here reassembling my entire workflow around credits i didn't sign up for because I liked testing in 480p..

been looking at alternatives honestly.

anyone else feel like runway is slowly pricing out the people who actually built their audience around this tool? or is it just me

reddit.com

u/SolidSnakesCranny — 14 hours ago

▲ 4 r/AIAssisted+3 crossposts

Need help setting up an AI video workflow trying to go from 30 min/video to 5 min/video

Hey everyone,

I'm running a small news content team (5 people) making 60-second vertical explainer videos with AI avatars. Right now each video takes about 30 minutes of manual work writing scripts, generating avatars, making infographics, stitching everything together.

We're trying to hit 80 videos/day and the current process just doesn't scale.

What I'm trying to build:

Basically a workflow where I can give it a news topic (like "RBI credit growth" or "startup funding trends") and it spits out:

A script

Voice audio

Avatar lipsync video

2-3 infographic/cutaway images

Edit timeline with exact timings

Right now I'm doing all of this manually across different tools and it's delaying us.

What I have:

I already have Claude Pro, and I've been experimenting with chaining prompts, but I'm not a developer so I'm hitting walls with the automation part. I can get Claude to write great scripts and storyboards, but then I still have to manually paste prompts into 5 different tools.

What I need help figuring out:

Can this be done entirely through Claude with MCP servers? (I saw Higgsfield has an MCP connector, not sure what else)

Should I be using API calls + some kind of script to chain everything?

Is there a no-code way to automate this that I'm missing?

Are there better tools I should be using instead?

I don't need it to be perfect. I just need something that reduces the manual copy-paste hell and gets us from 30 minutes to like 5-10 minutes per video.

The videos are pretty formulaic:

Indian avatars speaking to camera (20-25 seconds)

2-3 infographic cutaways (35-40 seconds total)

We add text overlays manually in the editor

Has anyone built something like this? Or know if Claude + MCP can actually handle this end-to-end? Open to any suggestions just trying to figure out the simplest path that actually works.

Not trying to hire an agency or spend months on a custom build. Just want something scrappy that works so we can scale up production.

Any ideas?

reddit.com

u/Master-Conclusion-78 — 21 hours ago

▲ 9 r/AIAssisted+1 crossposts

I'm looking for devs to publish modules to our store

Free or paid modules are welcome. Let me know if you have any questions: https://threatcrush.com/store

u/IndividualAir3353 — 20 hours ago

▲ 3 r/AIAssisted

I tested an AI presentation tool that does deep research before generating slides

I’ve been testing an AI presentation tool, and I wanted to share something I found interesting.

Most AI presentation tools I’ve tried are good at turning a prompt into slides, but the content often feels shallow.

But this tool I tested is different. Before generating the presentation, it tries to do deeper research and reason through the topic first. For example, instead of just creating slides like overview / benefits / challenges / conclusion, it tries to think through the whole thing.

Also, the slides feel less like basic templates and more like a polished deck, especially for cover slides, section dividers, and concept visuals.

My takeaway is that AI presentation tools are becoming more useful when they combine research + reasoning + visual generation, instead of only focusing on slide templates.

reddit.com

u/ElectricalPilot2297 — 24 hours ago

▲ 0 r/AIAssisted

[ Removed by Reddit ]

[ Removed by Reddit on account of violating the content policy. ]

reddit.com

u/SoftTomatillo6343 — 1 day ago

▲ 2 r/AIAssisted

Hospitality software worth the budget in 2026

After being burned by hospitality software that promised the world and didn't deliver, here's the budget-friendly stack that's earned its keep for me running short term rentals. Sharing because most "best tools" lists in this space are sponsored content and the genuinely useful tools rarely make the cut.

boom is the hospitality software that has everything, channel manager, owner reporting, and guest messaging into one platform. The chaining between functions is the part that earns its budget spot, a guest message triggering a cleaning task and updating owner reporting from one input is what makes consolidation pay off.
pricelabs for dynamic pricing or something else in that space, watches comp data and adjusts your nightly rates without manual input. One of those things you can technically do yourself but probably shouldn't past a certain portfolio size unless you enjoy losing money on suboptimal pricing.
minut for noise and occupancy monitoring, useful if your properties are in noise-sensitive locations or have neighbor relations to manage. Catches parties before they spiral, which is the kind of problem cheaper to prevent than to clean up afterward.
canva for property listings and owner-facing materials, not strictly hospitality software but every operator ends up needing design tooling for marketing assets and the speed-to-output is unmatched compared to alternatives.
otter for transcribing owner calls and team meetings, sounds boring but the searchable transcripts have paid for themselves many times when I need to look up what was agreed on months ago.

These five plus a decent channel manager (if your pms doesn't include one) is most of what you need. The mistake I see operators make is buying a separate tool for every problem instead of consolidating where it makes sense, which is how the typical hospitality software bill creeps up to ridiculous numbers per door.

reddit.com

u/ssunflow3rr — 1 day ago

▲ 6 r/AIAssisted+3 crossposts

I built a self-evolving AI kernel that mutates its own architecture. MIT-licensed, runs on CPU.

FLUX is an open-source Python kernel that orchestrates local language models (via Ollama) into a self-modifying ecosystem. It's not a wrapper — it's an evolutionary substrate.

**What it does:**

- An **Attractor** receives a question and generates an answer using a fast model (TinyLlama).

- A **Judge** evaluates the answer on a 0–1 scale. - If confidence drops below 1/φ (≈0.618, the golden ratio), the **Mutation Engine** triggers.

- A **MetaDesigner** (powered by Hermes 3 or DeepSeek-Coder) writes a new `.flux` ecosystem file — a formal grammar for describing cognitive architectures — which gets parsed, tested, and applied if it improves performance.

- A **Growth Supervisor** monitors stability and transitions the kernel from GROWTH to PRODUCTION.

**What's different:**

- It mutates its own structure, not just model weights. - It has memory (confidence history with EMA).

- It uses a custom language (`.flux`) with a Lark parser — not YAML, not JSON.

- It runs on modest hardware: I tested it on a Xeon without AVX2, 20 GB RAM. No GPU.

**The companion novel:**

There's also a novel (Italian + English, CC BY-NC-SA 4.0) that tells the story of a man who finds this exact kernel running on a forgotten server. If you read the novel, you can compile the kernel and everything connects. The novel is the manual.

**Repo:**

[github.com/flux-genotype/nodo_zero](https://github.com/flux-genotype/nodo\_zero) **Licenses:** Kernel = MIT. Novel = CC BY-NC-SA 4.0.

Happy to answer questions about the architecture, the mutation logic, or the `.flux` grammar.

u/Inner-Dot-7490 — 1 day ago

▲ 1 r/AIAssisted+1 crossposts

I stopped paying for AI writing tools by running everything locally on my machine — here's my setup

For the past few months I've been using Ollama to run AI models

locally and slowly replaced every paid AI tool I was using.

My current workflow:

Writing emails — I highlight my rough notes on any webpage,

right-click, and get a full structured email in seconds. Never

leave the tab I'm working in.

Job applications — I uploaded a screenshot of my resume once.

Now when I find a job posting I just select the description,

hit Job Apply, and get a personalized application email using

my actual skills and experience. Not a generic template.

Explaining images — anything on my screen I don't understand,

I snip it and ask the model to explain it. Error messages,

diagrams, screenshots from docs.

Rewriting — select any text anywhere on the web, rewrite it,

shorten it, make it professional, casual, whatever I need.

All of this runs on gemma4 locally. Zero API costs, zero

subscriptions, nothing leaves my machine.

The only cost was the time to set up Ollama, downloading a model even works with ollama cloud and a simple chrome extension.

After that it's completely free forever.

Anyone else running a similar local AI workflow? Curious what

models people are using for writing tasks specifically.

reddit.com

u/Illustrious_Act_8819 — 22 hours ago

▲ 7 r/AIAssisted+1 crossposts

managing a small IG account and one thing that always confused me was how unclear the follow list is.

You get this feeling that something changed people coming and going but you can’t really confirm it inside the app.

been using something that shows follow and unfollow changes more clearly.

What surprised me wasn’t the data itself, but how often patterns show up. Like certain types of accounts getting followed/unfollowed around the same time.

It made me realize how much behavior is happening behind the scenes that we don’t really see.

Not saying it changes everything, but it gave me a different perspective on how attention shifts on social platforms.

reddit.com

u/Living-Minute4116 — 1 day ago

▲ 4 r/AIAssisted+1 crossposts

Help to find appropriate tool for generating anime/cartoon video based on real video

Can someone please help if there's any tool to generate a comic/anime version of a video clip?
Thanks in advance.

reddit.com

u/codenamehitman47 — 1 day ago

▲ 3 r/AIAssisted+1 crossposts

Why is the Voice Mode so bad?

Regularly I try to use the live voice modes on different services like ChatGPT, Perplexity and Grok, but the experience is always so bad.

Why don’t they use the models they use when doing stuff in text?

It’s probably because of trying to maintain a low latency during the chat, but why not say “Let me research that for you…” or have 2 agents running and 1 reporting back during the other agent thinking.

The live models are so lazy and thus unusable in 90% of the cases for me.

What do you think?

reddit.com

u/HoarderOfBytes — 1 day ago

r/AIAssisted

Priorities: Making AI Powerful &gt; Making AI Safe

How I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway

Humanity's greatest hits: things we actually paused

Help?! Im newish

How do you actually test a voice AI agent without calling it yourself every time?

An Auditing Protocol for Human-AI Sessions: Free HTML Test to Measure Clarity, Coherence, Emphasis, and More

AI will deduce ethics from first principles

AI risk bell curve

Runway quietly removing unlimited and calling it an "upgrade"

Need help setting up an AI video workflow trying to go from 30 min/video to 5 min/video

I'm looking for devs to publish modules to our store

I tested an AI presentation tool that does deep research before generating slides

[ Removed by Reddit ]

Hospitality software worth the budget in 2026

I built a self-evolving AI kernel that mutates its own architecture. MIT-licensed, runs on CPU.

I stopped paying for AI writing tools by running everything locally on my machine — here's my setup

Help to find appropriate tool for generating anime/cartoon video based on real video

Why is the Voice Mode so bad?

Priorities: Making AI Powerful > Making AI Safe