u/Fun-Chipmunk5534 — reddlx

Hi everyone,

I’ve been using Obsidian for over a year to manage all my project notes, client data, and interlinked documentation. Recently, I tried to level up my workflow by implementing an "LLM Wiki" style setup using Claude Code (CLI) connected to local models via Ollama and LM Studio.

The Setup:

Interface: Claude CLI (Environment variables pointing to local endpoints).
Local Backends: Ollama / LM Studio (default configurations).
Models tested: gemma-4 and qwen3.5
Data: A well-linked Obsidian Vault with 1 year+ of markdown files.

The Problem: The performance is incredibly slow. A simple query like "What is the current status of Project X?" takes about 3 minutes to return an answer.

I expected much snappier results since everything is text-based and local. I understand that local LLMs are limited by (V)RAM and GPU/CPU power, but 180 seconds for a status update feels excessive for a text-heavy workflow.

My Questions:

Is this "normal" for a vault-wide RAG (Retrieval-Augmented Generation) or indexing task?
Are there specific configurations in Ollama/LM Studio I should tweak to speed up context loading?
Does Claude Code struggle specifically with the way local endpoints handle large context windows?
Could the bottleneck be the indexing of a year's worth of notes every time I run a command?

I’d love to hear from anyone running a similar setup. How do you keep your local workflow fast?

Specs for context: Macbook Pro M3, 12 Cores, 36 GB RAM