r/OpenSourceAI

I built a fully offline voice assistant for Windows – no cloud, no API keys
▲ 10 r/ollama+5 crossposts

I built a fully offline voice assistant for Windows – no cloud, no API keys

I spent months building Writher, a Windows app that combines faster-whisper for transcription and a local Ollama LLM for an AI assistant – everything runs on your machine.

What it does:

  • Hold AltGr → instant dictation in ANY app (VS Code, Word, Discord, browser...)
  • Press Ctrl+R → voice-controlled AI: manage notes, set reminders, add appointments
  • Smart date parsing ("remind me next Tuesday" works!)
  • Animated floating widget with visual feedback
  • English + Italian supported

No internet required after setup. No subscriptions. Open source.

GitHub: https://github.com/benmaster82/writher

Looking for feedback and contributors!

u/Immediate-Ice-9989 — 1 month ago
▲ 7 r/ArtificialInteligence+2 crossposts

Is it a mistake to treat PII filtering as a retrieval-time step instead of an ingestion constraint in RAG?

It seems like RAG pipelines often do:

raw docs -> chunk -> embed -> retrieve -> mask output

But if documents contain emails, phone numbers, names, employee IDs, etc., the vector index is already derived from sensitive data.

docs -> docs__pii_redacted -> chunk -> embed

Invariant: unsanitized text never gets chunked or embedded.

This seems safer from a data-lineage / attack-surface perspective, especially for local or enterprise RAG systems.

Or am I wrong?

Example: https://github.com/mloda-ai/rag_integration/blob/main/demo.ipynb

u/coldoven — 8 hours ago

Claude and codex limits are getting really tight what are good open source alternatives runnable locally with near cc / codex subscription pricing

Alot of issues rising in both claude code and codex in which limits are really get tight its not useable. I am looking into open source alternatives that are not very expensive to run on a vps basically looking for something that is max 100$ / month usd to run similar to claude max plan.

At least it should be good to code reasonablely good at least.

Any ideas wish i can find a good alternative since things are going really bad. Would love any advice or guidance on what to try first.

reddit.com
u/abdoolly — 6 hours ago
▲ 2 r/ClaudeAI+1 crossposts

I built a solo D&D adventure designed specifically for AI to DM

Looking for play testers for Chains of The Tempest

I’ve experimented with AI as the DM and have had mixed results as many have. So, I built something to fix that with the help of Claude AI. A self-contained mini adventure you upload directly into an AI chat (I've been testing on Claude), so the AI actually has the rules, encounters, and NPCs in front of it. Think of it as giving the AI a proper module to run rather than asking it to improvise an entire game from memory.

It includes custom skills that plug into Claude to handle things like dice rolls and spell lookups, plus some images to make it feel less like reading a wall of text. Claude built the skills for me.

I've been solo playtesting it and it's working way better than just prompting an AI cold, but I need more eyes on it. If you want to try it out and give feedback, DM me and I'll send it over.

u/BirchBirch72 — 8 hours ago

Is a cognitive‑inspired two‑tier memory system for LLM agents viable?

I’ve been working on a memory library for LLM agents that tries to control context size by creating a short term and long term memory store (I am running on limited hardware so context size is a main concern). It’s not another RAG pipeline; it’s a stateful, resource-aware system that manages memory across two tiers using pluggable vector storage and indexing:

* **Short‑Term Memory (STM)**: volatile, fast, with FIFO eviction and pluggable vector indexes (HNSW, FAISS, brute‑force). Stores raw conversation traces, tool calls, etc.

* **Long‑Term Memory (LTM)**: persistent, distilled knowledge. Low‑saliency traces are periodically consolidated (e.g., concatenation or LLM summarization) into knowledge items and moved to LTM.

**Saliency scoring** uses a weighted RIF model (Recency, Importance, Frequency). The system monitors resource pressure (e.g., RAM/VRAM) and triggers consolidation automatically when pressure exceeds a threshold (e.g., 85%).

What I’m unsure about:

  1. Does this approach already exist in a mature library? (I’ve seen MemGPT, Zep, but they seem more focused on summarization or sliding windows.)

  2. Is the saliency‑based consolidation actually useful, or is simple FIFO + time‑based summarization enough?

  3. Are there known pitfalls with using HNSW for STM (e.g., high update frequency, deletions)?

  4. Would you use something like this?

Thanks!

Source:

It was originally written in Java and I am working on porting to python.

Python https://github.com/Utilitron/VecMem

Java https://github.com/Utilitron/VectorMemory

u/utilitron — 23 hours ago

I left an AI loop running overnight. Woke up to 20 shipped agents.

So last month Karpathy dropped autoresearch. Autonomous loop, runs experiments overnight, keeps what works, throws away what doesn't. I watched it blow up and thought, this pattern is sick. But I don't do ML. What I do have is a problem that's been eating at me: finding good ideas to build.

In 2026, finding a problem worth solving is harder than actually solving it. Every obvious pain point has 12 SaaS tools already fighting over it. The interesting stuff is buried in Reddit threads at 2am where someone rants about something nobody's built for. I used to scroll those manually. Now I don't.

I took that same loop and pointed it somewhere else. My system scrapes Reddit, HN, GitHub, and Twitter for real problems. Scores them on demand, market gap, feasibility. If something clears the threshold it builds a standalone AI agent, validates it works, and commits it. The threshold ratchets up every build so the ideas have to keep getting better.

Here's the part that surprised me. The system rejected over 80 ideas before shipping 20. Resume ATS optimizer? GAP: 0, there's already 10+ free tools. Salary negotiation advisor? GAP: 0. Insurance policy analyzer? GAP: 0. Food ingredient scanner? Yuka has 8M users. The research log reads like a graveyard of "obvious" ideas that are already solved. But then it found wage theft affects 82M workers and there's no free tool that combines FLSA exemption analysis with state specific overtime calculation. Built wage-rights-advisor. Found that only 5% of homeowners appeal their property tax but 30 to 94% who do succeed. Built property-tax-appeal-advisor. Found that 70M Americans get contacted by debt collectors annually and every AI tool in that space serves the collectors, zero serve consumers. Built debt-collection-rights-advisor.

Now let me be real. How do you verify the quality? Not fully automated. The system boots each agent, sends a test prompt, checks if the output is useful. But these are MVPs. Some are rough. The research log with all the scored and rejected ideas, that's almost more valuable than the agents themselves. I wake up, look at what shipped, look at what got rejected and why, and pick the most promising direction. It's an idea machine that also writes the first draft of the code. When every obvious idea feels taken, the 80+ rejected ideas with documented reasoning for why they failed is honestly the best part.

Three files. program.md tells Claude Code where to research and what bar to hit. seed/ is a minimal Next.js template with 7 tools. run.sh launches Claude Code headless and auto restarts on context limits. No LangChain, no CrewAI. TypeScript, MIT, runs on OpenRouter or Ollama. Each agent is standalone, clone and run.

reddit.com
u/Illustrious-Bug-5593 — 11 hours ago
Week