Hi everyone,
I’ve been using Obsidian for over a year to manage all my project notes, client data, and interlinked documentation. Recently, I tried to level up my workflow by implementing an "LLM Wiki" style setup using Claude Code (CLI) connected to local models via Ollama and LM Studio.
The Setup:
- Interface: Claude CLI (Environment variables pointing to local endpoints).
- Local Backends: Ollama / LM Studio (default configurations).
- Models tested:
gemma-4andqwen3.5 - Data: A well-linked Obsidian Vault with 1 year+ of markdown files.
The Problem: The performance is incredibly slow. A simple query like "What is the current status of Project X?" takes about 3 minutes to return an answer.
I expected much snappier results since everything is text-based and local. I understand that local LLMs are limited by (V)RAM and GPU/CPU power, but 180 seconds for a status update feels excessive for a text-heavy workflow.
My Questions:
- Is this "normal" for a vault-wide RAG (Retrieval-Augmented Generation) or indexing task?
- Are there specific configurations in Ollama/LM Studio I should tweak to speed up context loading?
- Does Claude Code struggle specifically with the way local endpoints handle large context windows?
- Could the bottleneck be the indexing of a year's worth of notes every time I run a command?
I’d love to hear from anyone running a similar setup. How do you keep your local workflow fast?
Specs for context: Macbook Pro M3, 12 Cores, 36 GB RAM