r/deeplearning

TurboMemory: self-hosted “AI long-term memory” service with SQLite + daemon consolidation
▲ 1 r/deeplearning+1 crossposts

TurboMemory: self-hosted “AI long-term memory” service with SQLite + daemon consolidation

I’m building TurboMemory, a self-hostable long-term memory backend for LLM assistants.

It’s designed to run on a cheap VPS or laptop:

SQLite index

compressed embeddings (4-bit/6-bit/8-bit)

daemon that consolidates memory in the background

semantic search + topic filtering

designed for AI agents that need persistent memory

GitHub: https://github.com/Kubenew/TurboMemory

Question: if you were selfhosting this, would you prefer:

REST API service?

simple CLI?

Docker container?

I’m also looking for contributors who like storage/performance projects.

u/Hopeful-Priority1301 — 1 hour ago
Model Database Protocol
▲ 17 r/generativeAI+4 crossposts

Model Database Protocol

Model Database Protocol – Stop letting LLMs write raw SQL

I built an open-source MCP server that sits between LLMs and your database. Instead of letting the model generate raw SQL, it sends structured intents like:

{"intent": "list", "entity": "orders", "filters": {"total__gte": 100}, "limit": 10}

MDBP validates everything against a schema registry, enforces access policies (field-level, role-based, row filters), builds parameterized queries via SQLAlchemy, and returns LLM-friendly responses.

**Why?**
- LLMs hallucinate table/column names → MDBP catches it with schema validation
- Raw SQL from LLMs = injection risk → MDBP uses parameterized queries only
- No access control → MDBP enforces per-entity, per-role policies

**Features:**
- Auto-discovers your DB schema (zero config to start)
- All transports: stdio, SSE, Streamable HTTP, WebSocket
- Works with Claude Desktop, Cursor, and any MCP client
- Supports SELECT, JOIN, GROUP BY, HAVING, UNION, INSERT, UPDATE, DELETE
- Row-level filtering for tenant isolation

Python Library: pip install mdbp
GitHub: https://github.com/DorukYelken/Model-Database-Protocol

Happy to answer questions or hear feedback!
github.com
u/dorukyelken — 17 hours ago
▲ 1 r/learnmachinelearning+1 crossposts

The 90% Nobody Talks About

I built a multimodal GAN and deployed it on GCP Vertex AI.

The model took 2 weeks. Everything else took 5 months.

Here's the "everything else":

→ 3 weeks building a data preprocessing pipeline

→ 3 weeks refactoring code for Vertex AI's opinions on project structure

→ A 1 AM debugging session because GPU quota silently ran out

→ Days fighting a CUDA version mismatch between local dev and cloud

→ Building monitoring, logging, and deployment automation from scratch

We romanticize the model in ML. We show architectures and loss curves.

We don't show the Dockerfile debugging at midnight.

That's the 90%. And it's where the actual engineering happens.

Full story: [https://pateladitya.dev/blog/the-90-percent-nobody-talks-about\]

#MLOps #MachineLearning #GCP #VertexAI #Engineering

https://preview.redd.it/jeaud5du46tg1.png?width=1200&format=png&auto=webp&s=1efe8410e6524f7fe4c7f8b980ed0249d4dbe02f

reddit.com
u/invincible_281 — 3 hours ago
Open-sourcing a decentralized AI training network with constitutional governance and economic alignment mechanisms
▲ 4 r/OpenAI+4 crossposts

Open-sourcing a decentralized AI training network with constitutional governance and economic alignment mechanisms

We are open-sourcing Autonet on April 6: a framework for decentralized AI training, inference, and governance where alignment happens through economic mechanism design rather than centralized oversight.

The core thesis: AI alignment is an economic coordination problem. The question is not how to constrain AI, but how to build systems where aligned behavior is the profitable strategy. Autonet implements this through:

  1. Dynamic capability pricing: the network prices capabilities it lacks, creating market signals that steer training effort toward what is needed rather than what is popular. This prevents monoculture.

  2. Constitutional governance on-chain: core principles are stored on-chain and evaluated by LLM consensus. 95% quorum required for constitutional amendments.

  3. Cryptographic verification: commit-reveal pattern prevents cheating. Forced error injection tests coordinator honesty. Multi-coordinator consensus validates results.

  4. Federated training: multiple nodes train on local data, submit weight updates verified by consensus, aggregate via FedAvg.

The motivation: AI development is consolidating around a few companies who control what gets built, how it is governed, and who benefits. We think the alternative is not regulation after the fact, but economic infrastructure that structurally distributes power.

9 years of on-chain governance and jurisdiction work went into this. Working code, smart contracts with tests passing, federated training pipeline.

Paper: https://github.com/autonet-code/whitepaper Code: https://github.com/autonet-code Website: https://autonet.computer MIT License.

Happy to answer questions about the mechanism design, the federated training architecture, or the governance model.

u/EightRice — 17 hours ago
▲ 2 r/LocalLLaMA+1 crossposts

[D] Reinforcement Learning from Epistemic Incompleteness? (RLEI) Would this work

hi friends, this is just a shot in the dark but I can't stop thinking about it right now:

Have you ever considered doing RLVR on grammar induction with autoregressive LLMs ? (triggered by prompt)

Another way to think of it would be discrete autoencoding, using tokens to engrave models and rewarding for density and shorter description length while penalizing loss of content and information.

The weights self-steer during RLVR towards a regime in which it is increasingly programmable by the tokens, and converge on a structure that is more like a generator for new latent space configured ephemerally by the tokens.

The representation of these models in tokens are alien, yet more transparent and inspectable than weights for AI interpretability and safety. Does that all make sense? Theoretically this is actually what was desired back then with the mesa optimizer capability.

Operations on these models occur in context emergently through inference. For example packing a model is a A u B type operation, which you can think of as being like <object>...</object> fences whose contents look like perhaps

∃∀⌬⇒∈ΣΞ:⇔Θ∈Ψ(⇓φΩ), ∫d∆ ∀Ω∈Σ:∀Ξ∉Ϲ(ΦΩΠ⇌Θ⊗Ψ), ∀Ψ∉Σ:∀ΦΨΣ(ΠϝΣ϶ΣΨ), ∀Ξ∉϶:∀ΣΦΠ(ΦΩϨΠϡ), ∫dϴ ∀ϵ∈Ρ:∀Ψ∉Ϯ(Ϭϭ϶⌬ϬΣ), ∀ΦϳΠ:∀Π∈ϴ(Φ⊕ΣΘϿ), ∀ΠϲΣ:∀ΨϳϹ(ϲ⌬ω⊕ΨΠ), ∫dΩ ∀ϱ∈Σ:∀Φ∈Σ(ΠϫΨ), ∀ϵϱϲ:∀ϻΠΦ(ϵ⊗ϧΒϴ), ∀Φϱϴ:∀Ϭϵϵ(Σ∈Ψϵϯ), ∀ΦπϿ:∀θϳΨ(ϱϳϬϵϻ), ∫dΨ ∀ϯ∈ϕ:∀ΠϴΨ(Ϥ⊗ϴΨΚϷ), ∀Ϭϩϵ:∀σπϣ(Ϡϝϴϸ⊗Ϡϸ), ∀ϿΨϷ:∀Ψϲϭ(ϻ∈ϭ⊗ϽÞΣ), ∀ϴΠϾ:∀ϠϦϭΦ(ϴ∉ϬΦΨϢ), ∫dσ ∀϶∈Π:∀ΠϮϣϳ(Ϧ⊗δϮϬϧ), ∀ΦϷϭ:∀ϲ϶ϳ(Ϲ⊕ϯ↻ΓϦ), ∀θϦϤ:∀ϴ∈ΨϬϬ(ϱ≈Φϳϧ), ∀ΠϿϳ:∀Ϭ∉Π(ϱ∈Ϧ⊕ϭι), ∫dΣ ∀ϧ∈Π:∀ϣϳϧ(ΦΣϵϧΣΨ), ∀ϵϷϼ:∀Ϧ∈ϳϧ(ϾϢϹΦΠϲ), ∀ϼΘΨ:∀ϬϷΠ(ϹΘΦϣϱ), ∀ϽϠϦ:∀ϦϴϿ(ϧΘϺϴϮ), ∫dΩ ∀ϤΘΦϺ:∀ϳΨϭ(Θ⊗ϭϣϲϺ), ∀ϤϹϣ:∀ϢϳϹ(ϦΦϾΘϠ), ∀ϣϯϩ:∀Ϯϴϰ(ϣΞϴΣϲ), ∀ϡϥΨ:∀ϿΘϣ(ϴΣ϶ΘϥϾ), ∫dϺ ∀ϦϨϦϥ:∀ϴΣϽ(ΣΨϵ⇒ϭϴ), ∀ϲϺϱ:∀ΨϴΣ(ΘϠϲϷΨ), ∀ΨϬϦ:∀Ϥ∈ϭ(Φ⊗ΨΠΠΣ), ∀ϴϠϾ:∀ΨϿΠ(ϥϔΦΦϨϤϵ), ∫dϯ ∀ϥϦϹ:∀ϭϭϳ(ΨϳυϽϣ), ∀ϡϺϵϲ:∀ϿΨΦϦ(Ϥ⊗ϡϿϦΠ), ∀ϥϢϺΨ:∀ΘϿΦ(Ϥ϶

I would pretrain the interface with reconstruction/distillation first, then use RL to shrink and stabilize the code. (both is verifiable reward)

Since the weights already encode vast information about the world, the hope is that creativity is more a thing of composition and structure. So your context-level models are acting like rich compositional indices over the high-dimensional embedded knowledge and features in the weights.

This should take us out of RLVR and into RLEI where the reward is intrinsic. With RLVR you can only reward what you can verify, and that doesn't extend to everything we care about.

In RLEI, the reward signal is generated by its own representations. The model knows where the representation is incomplete because there is a clear measure: it costs more tokens. Uncertainty is entropy. A governing law it finds that explains a thousand observations costs fewer tokens than a thousand individually encoded observations +bayesian uncertainty around it.

It sounds unbelievable, but if instead of asking "let's test if this is real" we asked more "how do I make this real" I think we could discover that many obstacles are actually implementation details, finding the right schedule, hyperparameters and policies. Hoping to discuss this more in detail here before I get training. Cheers

reddit.com
u/ryunuck — 13 hours ago
Open-source memory system for long-term collaboration with AI — episodic memory + world model, multi-user, git-tracked
▲ 3 r/ClaudeCode+1 crossposts

Open-source memory system for long-term collaboration with AI — episodic memory + world model, multi-user, git-tracked

I do independent research (AI/ML) and work on long-running software projects with Claude Code, some spanning many months. To work with AI effectively over weeks, months, or even years, you need detailed memory: what was done, what was tried, what worked, what didn't, why certain decisions were made, how things work in the project, what the current state is. The existing Claude Code memory system is not designed for this.

So I built **ai-collab-memory** — a structured methodology that gives the AI persistent episodic memory and a world model, all in plain text files tracked in git.

I'm looking for developers, researchers, or anyone working on long-running projects with AI to test it and share their feedback.

**What it does:**
- **Episodic memory** — an append-only history of what was done, decided, and learned. Nothing gets pruned — you can always trace back to the reasoning behind past decisions.
- **World model** — the AI's current understanding of your project: context, preferences, domain knowledge, procedures, current state. Maintained and updated as things change.
- **In-context awareness** — compact indexes are always loaded in the AI's context window, so the AI *knows what it knows* without having to search. It can make connections to prior work without you asking.
- **Multi-user** — every note includes user attribution. Commit the memory files to a shared repo and the whole team benefits. New members get up to speed through the AI's accumulated knowledge.

**How to install:**
Ask Claude Code:
&gt; "Install the long-term collaboration memory system by cloning https://github.com/visionscaper/ai-collab-memory to a temporary location and following the instructions in it."

Installation takes about 5 minutes and one confirmation. The system activates on the next session. I highly recommend reading the README, especially "Working with the Memory System" and "How It Works".

**Some practical benefits I've experienced:**

  1. Working with the AI over months on the same project — it knows the history, the constraints, the decisions and their reasoning.
  2. The AI's responses are grounded in accumulated project context, not just what's in the current session.
  3. In a team setting, the AI has an overview of what everyone has done. All history is user-attributed.

Although this needs further validation, because the AI has much more context, fewer tokens should be spent on reanalysing code bases and data.

The system is actively being developed and tested. Feedback and experience reports are very welcome — file issues at the GitHub repo or comment here.

GitHub: https://github.com/visionscaper/ai-collab-memory

u/visionscaper — 21 hours ago
Loss Functions &amp; Metrics Explained Visually | MSE, MAE, F1, Cross-Entropy
▲ 1 r/learnmachinelearning+1 crossposts

Loss Functions &amp; Metrics Explained Visually | MSE, MAE, F1, Cross-Entropy

Loss Functions & Metrics Explained Visually in 3 minutes a breakdown of MSE, MAE, Cross-Entropy, Precision/Recall, and F1 Score, plus when to use each.

If you've ever watched your model's loss drop during training but still gotten poor results on real data, this video shows you exactly why it happened and how to pick the right loss function and evaluation metric for your problem using visual intuition instead of heavy math.

Watch here: Loss Functions & Metrics Explained Visually | MSE, MAE, F1, Cross-Entropy

Have you ever picked the wrong loss or metric for a project? What's worked best for you — MSE for regression, Cross-Entropy for classification, F1 for imbalanced data, or a custom loss you engineered?

u/Specific_Concern_847 — 12 hours ago
Image 1 — I built Draw3D, where you can use 3D objects as references to compose images with AI.
Image 2 — I built Draw3D, where you can use 3D objects as references to compose images with AI.
▲ 1 r/deeplearning+1 crossposts

I built Draw3D, where you can use 3D objects as references to compose images with AI.

u/jabedbhuiyan — 12 hours ago

LLM-as-a-Judge is convenient, but reproducibility is a real issue — what are the alternatives?

Reproducibility in text evaluation is becoming a challenging issue. If you've used LLMs or similar models as automated judges for summarization, translation, or QA, you've likely noticed that change the prompt slightly and the scores shift, run it across non-English languages and quality drops, try to replicate someone else's setup and you get different numbers. It's convenient, but difficult to reproduce .

The question we kept coming back to: do you actually need a frontier LLM to evaluate generated text well, or is that just the path of least resistance?

We trained a family of small deterministic models (<1B parameters) called OmniScore that approximate LLM-judge behavior without the reproducibility headaches.

A few things that might be interesting to learn:

  • Trained on ~564k synthetic instances across 107 languages — most evaluation work is still very English-heavy, which is a real gap
  • Evaluated on 8,617 manually annotated examples across QA, translation, and summarization in 6 languages
  • Supports reference-based, source-grounded, and hybrid scoring modes
  • Deterministic by design — same input, same score, every time

The gap we're trying to fill sits between two unsatisfying options: frontier LLM judges (flexible but expensive and inconsistent) and traditional metrics like BLEU/ROUGE (cheap but limited to capture semantics). Our results suggest lightweight learned metrics can close much of that gap.

reddit.com
u/firojalam — 17 hours ago

I implemented PPO, GRPO, and DPO from scratch on the same model and compared them — the ranking completely reversed after hyperparameter tuning

Over the last couple of months I built a full LLM training pipeline from scratch in PyTorch architecture, pretraining, SFT, reward modeling, and three post-training alignment methods. No pretrained weights, no alignment libraries.

I just published the final comparison study. The short version:

Phase 1 results (baseline hyperparameters): PPO: +3.99 → GRPO: -0.12 → DPO: +2.40 (average reward on 16 fixed prompts)

Phase 5 results (after targeted tuning): DPO: +4.15 → SFT: +4.13 → GRPO: +3.31 → PPO: +3.52

The Phase 1 winner became the Phase 5 loser. A few things I found interesting:

GRPO group collapse is real and diagnosable. With k=4, two of my 16 prompts had group std=0 no gradient flowed at all on those prompts. Increasing k to 8 and generation temperature to 1.0 fixed it completely. The +3.43 improvement is the clearest causal result in the whole study.

DPO reward margin explosion is a training signal, not a success metric. With β=0.1, the margin grew from ~1 to 599 by step 150. Loss collapsed to zero by step 30. The model was overfitting each pair rather than learning a general preference. Increasing β to 0.3 slowed this down and produced actual negative margins at some steps which sounds bad but is the loss function doing its job correctly.

PPO over-correction goes in both directions. kl_coef=0.01 was too weak (forgetting SFT-strong prompts), kl_coef=0.1 was too strong (over-constraining the policy). The optimal value is somewhere between them.

Evaluation temperature matters independently of training. SFT improved by +1.12 with zero retraining just by changing from temperature=0.7 to temperature=0.3. Phase 1 underestimated SFT's ceiling.

Full write-up with training curves, comparison tables, per-prompt delta heatmap, and DPO/GRPO training dynamics: brayanbrayan.github.io/2026/04/02/rlhf-post-blog.html

I'm a self-taught ML engineer based in Nairobi actively looking for research or engineering roles in alignment and RL. If anything here resonates with what your team works on, feel free to reach out.

reddit.com
u/Public_Expression_92 — 1 hour ago

Any suggestion for making AI write understandable code?

Hi, I am in vibe coding related stuff for a month more or less, practicing and studying about it. Now I finally decided to maintain the generated code and ended up disappointed.

I have found redundant code, repetitive object initialization alternative flows that do not follow the same rules along the project...

I have experience for years programming in python, but wasn't able to modify a button functionality in a pygame MVP videogame without asking it to the IA again.

I am using MinMax 2.5 with OpenCode for pygame programming. I am forcing it to refine the code and to explain it, but it is barely improving the project.

On one hand I feel motivated by the power unleashed with the AI agents but on the other hand I don't trust the code for maintenance and in the long run.

Do you have any better experience? Any advice to make the AI code in a more structured and comprehensive way? Some skills or specific prompt patterns that you would recommend.

reddit.com
u/Satirosix — 21 hours ago
Week