u/jimmy6929

Do you guys know what’s going on here?

Do you guys know what’s going on here?

I try to print this model, and I printed it a couple of times before it worked fine with the same settings, but now it just keeps being like this.

I changed the fresh new filament, and it still happens.

Anyone have a similar experience before? And how do you guys solve it?

Printer Model: X1C Slicer used (e.g. Cura, Prusa, etc.) Bambu Lab Filament material and brand: Bambu Lab PLA matte Nozzle and bed temperature: 220 and 55 Print Speed: 100 Retraction settings: Default

u/jimmy6929 — 5 days ago
▲ 3 r/LocalLLM+2 crossposts

A couple of days ago, I posted asking whether people would trust a 10B model to edit their files. The feedback got me thinking, if the model has the right context, it makes the right decisions. So I built a RAG pipeline for my Obsidian vault. Here's the quick rundown:

Ingestion: Vault syncs incrementally (SHA-256 diff, only changed files get re-processed). Documents are chunked with a markdown-aware splitter (512ch, respects headings and code fences), embedded with Qwen3-Embedding-0.6B, and stored in SQLite + sqlite-vec. No separate vector DB, everything local.

Retrieval: Hybrid search (vector + BM25 via Reciprocal Rank Fusion), then reranked with Qwen3-Reranker-0.6B (top 30 → top 8). Includes neighbor expansion and U-shape reordering to fight "lost in the middle."

The key part: A confidence router scores the retrieved context, high confidence routes to strict citation-only mode, low confidence falls back to generative. This is what I think actually matters for agentic tasks: the model won't blindly edit your files if it's not confident in what it found.

Everything runs on-device with FastAPI + any local LLM backend (Ollama, MLX, llama.cpp).

My thesis: better retrieval → better decisions → safer file edits. Do you think this is the right approach, or is there a better way to make SLMs reliable enough for agentic work?

u/jimmy6929 — 6 days ago
▲ 1 r/SaaS

Hey everyone,

I've been building an open-source, self-hosted AI assistant called Molebie AI, and I just had one of those moments where the whole "local-first" thing really clicked for me.

I was on a flight, no WiFi, no internet at all, and I pulled up Molebie AI on my laptop to work through some documents I had locally. I used the RAG pipeline to query my files, had a normal chat conversation with a local model, and got actual work done. Everything ran on my machine. No API calls, no cloud, no errors.

I know this sub understands why this matters, but experiencing it in practice hit different. There's something satisfying about watching inference happen at 35,000 feet with airplane mode on.

For those unfamiliar, Molebie AI is a FastAPI + Next.js stack that supports multiple inference backends (Ollama, local models, etc.), has built-in RAG with hybrid search and reranking, voice mode, and a terminal observability dashboard. Everything is self-contained.

It's MIT-licensed and still early (v0.1), so if you try it and something breaks, I'd genuinely appreciate the feedback.

Repo: https://github.com/Jimmy6929/Molebie_AI

Happy to answer any questions about the stack or the experience.

u/jimmy6929 — 11 days ago