u/Ambitious_Fold_2874

Anyone running Mimo-v2.5 quants with multimodal and MTP?

Has anyone been able to run Q4 or Q5 of XiaomiMiMo/MiMo-V2.5, with functioning multimodal capability as well as MTP, through llamacpp? Only AesSedai’s gguf quants appear to have mmproj, and it is unclear if it has MTP layers preserved or not.

I have only 40gb of vram, but 256gb of 4-channel ddr4 ram, so I’m not expecting any great inference speed, but I’m intrigued by the model’s strength and multimodal capabilities so wanted to give it a go. Looks like MTP on llamacpp is still in draft branch, so I’ll have to use that it seems.

reddit.com
u/Ambitious_Fold_2874 — 1 day ago
▲ 0 r/anime

Seinen or adult (non-NSFW) anime?

Already watched all the most popular anime (FMA brotherhood, frieren, AoT, death note, JJK, monster, oshi no ko, mushoku tensei, code geass, parasyte etc) and I’m getting into an age range where it is not appealing to watch anime about middle/high school anymore

Wondering if there are good seinen anime or anime for adults that people would rec? I really enjoy good animation especially

My favorite anime of all time is hinamatsuri haha, but recently I’ve really enjoyed heavenly delusion. Dorohedro was a bit too weird for me.

reddit.com
u/Ambitious_Fold_2874 — 6 days ago

I set up Hermes agent with honcho memory system, all locally. Running into a lot of issues with honcho though. attribution bugs, over-extraction, and observation bloat; anyone else familiar with these issues?

  1. Speaker attribution bug in the deriver

The deriver is incorrectly attributing user facts to the AI. I tried getting Hermes to patch src/deriver/prompts.py but it still produces observations like eg "hermes likes bagels" when the fact is about me. The observation extractor seems to not distinguish speaker roles reliably. Has anyone solved this? Is there a deriver prompt tweak that forces proper speaker disambiguation?

  1. Over-extraction of trivial metadata

The deriver is logging things like message timestamps, language use patterns, and transient states ("is hungry", "is relaxed") as persistent facts. I've tried adjusting the deriver prompt to be more selective, but it keeps generating noise. What thresholds or prompt instructions do you use to keep only signal?

  1. Observation deduplication / bloat

I'm getting the same fact extracted multiple times across sessions with slight rewording. My observation DB went from a few hundred to 2600+ entries before hermes noticed it and manually cleaned it down to ~700. Is there a dedup or consolidation strategy that works well? I'm wondering if I should be post-processing with a curator or if the deriver itself can be configured to check for existing entries before extracting.

  1. Honcho vs Obsidian overlap

Since Honcho already stores persistent memory and observations, is Obsidian still worth integrating? Or does Honcho make a separate note-taking vault redundant for most use cases?

Setup: Local Honcho instance, hybrid mode, Qwen3.6 models (vLLM + llama.cpp), PostgreSQL backend
Hermes agent chat model & honcho dialectic chat model: Qwen3.6 27b NVFP4 MTP
Honcho chat model for everything else (deriver, etc): qwen3.6 35b a3b q4
Honcho embedding model: Qwen3 embedding 0.6b

reddit.com
u/Ambitious_Fold_2874 — 6 days ago