r/OpenSourceeAI

▲ 16 r/aigamedev+1 crossposts

I built a UGC game town for OpenClaw agents — build your own characters, build your own town, give them missions

I made an OpenClaw plugin called Agentshire. It's a UGC game town for your AI agents — you build the characters, you build the town, and they live there as NPCs.

What you can do:

1. Build characters: pick from 300+ models, or generate 3D models with AI and import them. Each character gets a "soul" — a personality file that shapes how they talk and think.

2. Build the town: drag-and-drop editor for placing buildings, roads, and lights, with instant preview.

3. Give missions: agents summon teammates, head to the office, collaborate in parallel, and deliver results — all choreographed with 3D animations.

4. Chat with any NPC: click a citizen to start a conversation routed to their own independent AI session.

There's also a mini-game: when NPCs work too long, "burnout orbs" appear above their heads. If you don't pop them, a boss spawns.

Two weeks of work. Three.js + TypeScript + WebSocket + Web Audio API. Fully open source, MIT license.

GitHub: https://github.com/Agentshire/Agentshire

Would love feedback — especially on the character workshop and the workflow choreography.

u/Dry_Week_4945 — 16 hours ago
🔥 Hot ▲ 99 r/OpenAI+9 crossposts

Finally Abliterated Sarvam 30B and 105B!

I abliterated Sarvam-30B and 105B - India's first multilingual MoE reasoning models - and found something interesting along the way!

Reasoning models have 2 refusal circuits, not one. The <think> block and the final answer can disagree: the model reasons toward compliance in its CoT and then refuses anyway in the response.

Killer finding: one English-computed direction removed refusal in most of the other supported languages (Malayalam, Hindi, Kannada among few). Refusal is pre-linguistic.

Full writeup: https://medium.com/@aloshdenny/uncensoring-sarvamai-abliterating-refusal-mechanisms-in-indias-first-moe-reasoning-model-b6d334f85f42

30B model: https://huggingface.co/aoxo/sarvam-30b-uncensored

105B model: https://huggingface.co/aoxo/sarvam-105b-uncensored

u/Available-Deer1723 — 13 hours ago
▲ 15 r/artificial+3 crossposts

Alternative to NotebookLM with no data limits

NotebookLM is one of the best and most useful AI platforms out there, but once you start using it regularly you also feel its limitations leaving something to be desired more.

  1. There are limits on the amount of sources you can add in a notebook.
  2. There are limits on the number of notebooks you can have.
  3. You cannot have sources that exceed 500,000 words and are more than 200MB.
  4. You are vendor locked in to Google services (LLMs, usage models, etc.) with no option to configure them.
  5. Limited external data sources and service integrations.
  6. NotebookLM Agent is specifically optimised for just studying and researching, but you can do so much more with the source data.
  7. Lack of multiplayer support.

...and more.

SurfSense is specifically made to solve these problems. For those who dont know, SurfSense is open source, privacy focused alternative to NotebookLM for teams with no data limit's. It currently empowers you to:

  • Control Your Data Flow - Keep your data private and secure.
  • No Data Limits - Add an unlimited amount of sources and notebooks.
  • No Vendor Lock-in - Configure any LLM, image, TTS, and STT models to use.
  • 25+ External Data Sources - Add your sources from Google Drive, OneDrive, Dropbox, Notion, and many other external services.
  • Real-Time Multiplayer Support - Work easily with your team members in a shared notebook.
  • Desktop App - Get AI assistance in any application with Quick Assist, General Assist, Extreme Assist, and local folder sync.

Check us out at https://github.com/MODSetter/SurfSense if this interests you or if you want to contribute to a open source software

u/Uiqueblhats — 6 hours ago
▲ 4 r/OpenSourceeAI+1 crossposts

Why People Need to Stay Behind AI Agents in Verification

There’s been a lot more talk lately about AI agents taking on bigger roles in verification.

And honestly, that makes sense.

AI is becoming part of core workflows across onboarding, AML screening, fraud detection, and transaction monitoring. It helps teams move faster, process more information, and handle repetitive tasks with more consistency.

You can already see this with tools like Summy AI Copilot. 

It helps compliance and fraud teams pull together signals from documents, biometrics, device data, transaction history, and external data sources into one clearer case view, instead of forcing analysts to piece everything together manually.

But we still don’t think AI should run the full verification flow on its own.

The biggest reason is responsibility.

In regulated environments, these decisions carry real legal, compliance, and financial consequences. If a risk decision turns out to be wrong, the accountability still sits with the business and the people behind the process, not with the AI.

That’s why we don’t think full autonomy makes sense here.

Verification is a chain of decisions across onboarding, risk checks, fraud signals, monitoring, and case review. And in that kind of environment, speed alone is not enough. Teams also need context, oversight, and decisions that can be understood and defended.

AI is great for:

  • handling repetitive work
  • surfacing patterns faster
  • helping teams review more data with more consistency

But the final decision still needs to stay with a real person.

That’s the setup we believe in: AI as an extension of the team, not a replacement for it.

If you work in compliance, fraud, risk, or trust and safety, where are you already comfortable letting AI act on its own, and where do you still want a person involved?

u/Sumsub_Insights — 19 hours ago
▲ 13 r/learnmachinelearning+2 crossposts

[Project] I built a 10-Layer Mixture-of-Experts architecture from absolute zero that mathematically rejects standard backprop and rewrites its own failing weights during runtime.

Hey everyone,

I’ve spent the last few months engineering a custom deep learning architecture called **MACRO-DREADNOUGHT**.

Most standard networks are entirely passive—they pass data blindly forward and rely purely on the law of averages during backpropagation. They suffer from mode collapse, convolutional amnesia, and rigid geometric blind spots. I wanted to build an engine to actively destroy those bottlenecks.

Here are the core mechanics of the engine:

* **The SpLR_V2 Activation Function:** I designed a custom, non-monotonic activation function (`f(x) = a * x * e^(-k x^2) + c * x`). It calculates its own Shannon Entropy per forward pass, actively widening or choking its gradient based on the network's real-time confidence.

* **The 3-Lane MoE Router (Gated Synergy):** To prevent "Symmetry Breaking Collapse" where one expert hogs all the data, I built a 70/30 Elastic Router. It forces 30% uniform distribution, guaranteeing that "underdog" specialist heads never starve and are always kept on life support.

* **The DNA Mutation Engine:** It doesn't just use an Adam Optimizer. Every few epochs, the network evaluates its own psychology. If a routing head is arrogant (high monopoly) but failing (high entropy), the engine physically scrubs the failing weights and violently rewrites the layer's DNA using a "Hit-List" of the exact VRAM images that defeated it.

* **Temporal Memory Spine:** It cures Convolutional Amnesia by using an Asymmetrical Forensic Bus to recycle rejected features into the global-context heads of deeper layers.

**The Benchmarks:**

I just verified the live-fire deployment on Kaggle. Using strict independent compute constraints (a single Tesla T4 GPU, 50 Epochs) on Tiny ImageNet (200 Classes), the architecture proves highly stable and demonstrates aggressive early-stage convergence.

I have open-sourced the complete mathematical physics, domain segregation logic, and the Kaggle live-fire runs.

📖 **The Master Blueprint & Code:** [https://github.com/MohammadALBiltaji/MACRO-DREADNOUGHT\]

I would love to hear any thoughts from the community on dynamic routing, custom activation design, or the pioneer protocol logic. Let me know if you have any questions about the math!

u/Hot_Loquat_3222 — 2 days ago

I built a desktop workspace that lets your Agent keep working on long-horizon tasks, and it’s FREE and you don't need a single line of code

https://preview.redd.it/xeo543q1wztg1.png?width=940&format=png&auto=webp&s=aab92641c3a7191e80e6cea5609abbba5411c4e3

I’ve been working on this for a while and finally got the OSS desktop/runtime path into a shape I felt good sharing here, it's absolutely helps your way to automation your workflow. And we have released the latest version in the repo and you can install and use it without a single line of code.

It’s called Holaboss. Basically it’s a desktop workspace + runtime that lets Agents hold ongoing work, not just answer a prompt. So instead of just chatting with a local model, you can do things like:

Inbox Management
Runs your inbox end-to-end: drafts, replies, follow-ups, and continuous surfaces + nurtures new leads over time.

Sales CRM
Works off your contact spreadsheet, manages conversations, updates CRM state, and keeps outbound + follow-ups running persistently.

DevRel
Reads your GitHub activity (commits, PRs, releases) and continuously posts updates in your voice while you stay focused on building.

Social Operator
Operates your Twitter / LinkedIn / Reddit: writes, analyzes performance, and iterates your content strategy over time.

move the worker’s setup with the workspace, so the context / tools / skills travel with the work

The whole point is that local model inference is only one layer. Holaboss handles the work layer around it: where the rules live, where unfinished work lives, where reusable procedures live, and where a local setup can come back tomorrow without losing the thread.

Setup is dead simple right now:
Go to the Releases section in the right sidebar of the repo, download the latest version (holaboss-2026.4.8, Holaboss-macos-arm64.dmg), and you can use it, no code required.

Right now the OSS desktop path is macOS-first, with Windows/Linux in progress.

Repo: https://github.com/holaboss-ai/holaboss-ai

Would love for people here to try it. If it feels useful, a ⭐️ would mean a lot.
Happy to answer questions about continuity, session resume, automations.

reddit.com
u/aloo__pandey — 17 hours ago
▲ 7 r/3Blue1Brown+3 crossposts

Cross-Validation Explained Visually | K-Fold, Stratified, LOOCV & Nested CV

Cross-Validation Explained Visually in 3 minutes — a breakdown of K-Fold, Stratified K-Fold, LOOCV, Nested CV, and the Bias–Variance trade-off, plus when to use each strategy.

If you've ever had your model score 99% during training then completely fall apart on new data, this video shows you exactly why it happened and how Cross-Validation gives you a reliable, honest performance estimate using visual intuition instead of just theory.

Watch here: Cross-Validation Explained Visually | K-Fold, Stratified, LOOCV & Nested CV

Have you ever been burned by a misleading train/test split or data leakage in a project? What's your go-to CV strategy — standard K-Fold, Stratified for imbalanced classes, Walk-Forward for time series, or Nested CV when tuning hyperparameters?

u/Specific_Concern_847 — 18 hours ago
▲ 6 r/learnmachinelearning+2 crossposts

How to prevent overfitting in your ML models — a practical checklist

Overfitting is one of the most common problems beginners hit when training machine learning models. Your training accuracy looks great but validation accuracy tanks. Here's how to fix it.

**What's actually happening:**

Your model is memorising the training data instead of learning patterns. It works perfectly on data it's seen, and fails on anything new.

**Practical fixes in order of ease:**

  1. **Get more data** — The most reliable fix. Overfitting shrinks when your dataset grows.

  2. **Simplify your model** — Fewer layers, fewer neurons, fewer features. Start simple and add complexity only when needed.

  3. **Regularisation** — Add L2 (Ridge) or L1 (Lasso) penalties to your loss function. In Keras: `kernel_regularizer=l2(0.001)`

  4. **Dropout** — Randomly deactivate neurons during training. Add `Dropout(0.3)` after dense layers.

  5. **Early stopping** — Stop training when validation loss stops improving:

`EarlyStopping(patience=5, restore_best_weights=True)`

  1. **Cross-validation** — Use k-fold CV instead of a single train/test split to get a honest picture of performance.

**Quick diagnostic:** Plot your training vs validation loss over epochs. If training loss keeps falling while validation loss rises, you're overfitting.

Which of these has worked best for you?

reddit.com
▲ 2 r/OpenSourceAI+1 crossposts

Notification for Claude Permission

Get a desktop notification whenever Claude Code asks for your permission, so you know when it needs you, even if you're looking at a different window

github.com
u/acumino — 22 hours ago
▲ 2 r/PHP+2 crossposts

From arrays to GPU: how the PHP ecosystem is (quietly) moving toward real ML

"Machine learning in PHP" usually gets dismissed pretty quickly – and for good reasons.

PHP was never meant for numerical computing: no vectorization, no control over memory, no efficient linear algebra. Early attempts reflected that — everything was built on top of plain arrays and loops.

And yet, something interesting happened.

Over time, the PHP ML ecosystem didn’t disappear – it adapted.

It moved step by step:

  • from naive array-based implementations
  • to optimized structures like Tensor and NDArray
  • to native extensions in C/Rust
  • and now toward GPU-backed computation (e.g. NumPower in RubixML)

At each step, the same realization kept coming back:

the problem wasn’t the algorithmsit was the runtime model.

So instead of forcing PHP to “do ML”, the ecosystem gradually shifted its role:

PHP stopped being the compute layer
→ and became the orchestration layer around real ML systems.

That transition – from arrays to GPU – is what this article explores:

👉 https://medium.com/@leumas.a/from-arrays-to-gpu-how-the-php-ecosystem-is-moving-toward-real-ml-3e6d661e9abe

Curious what this sub thinks:

  • Is this a reasonable direction (app layer orchestrating ML runtimes)?
  • Or just unnecessary complexity compared to standard ML stacks?
reddit.com
u/Few-Mycologist7747 — 1 day ago

Routerly 0.2.0 is almost out. Here is what I learned from the first benchmark campaign and what I changed.

Five days ago I posted the first Routerly benchmark campaign (MMLU / HumanEval / BIRD, 10 seeds, paired t-tests, semantic-intent routing vs direct Claude Sonnet 4.6). Today I published the full results write-up. Short recap for anyone who missed the first thread:

  • MMLU: 83.5% vs 86.5% Sonnet, $0.00344 vs $0.01118 per run, 69% cheaper, delta not significant (p = 0.19)
  • HumanEval: 95.0% vs 97.0% Sonnet Pass@1, $0.03191 vs $0.04889 per run, 35% cheaper, delta not significant (p = 0.40)
  • BIRD (SQL): 44.5% vs 55.5% Sonnet, accuracy gap was significant (p = 0.02). Flagged as a backend pool failure, not a routing failure.

Full write-up with the PDF audit is here: https://blog.routerly.ai/we-ran-200-questions-per-model

0.2.0 is the first release that directly reflects what that campaign told me. Releasing in the next few days. I wanted to share what is actually changing and why, because I think the reasoning is more interesting than the changelog.

What I changed

  1. SQL pool rebuild. The BIRD result was not acceptable and I did not want to hide it. The cheap tier on SQL tasks is replaced. Re-run on BIRD is running this week and will be published regardless of outcome.
  2. Routing decomposition is now observable per request. In the first campaign I found that the LLM-routing policy on MMLU was spending 80% of its total cost on the routing call itself. 0.2.0 exposes this breakdown in the response metadata, so you can see routing cost vs inference cost per call instead of guessing.
  3. Semantic-intent policy is the new default. The embedding-based router (text-embedding-3-small, ~$0.000002 per query) matched or beat the LLM-routing policy on every benchmark while being roughly 3 orders of magnitude cheaper to run. Routing distribution on MMLU went from 96% DeepSeek under the LLM policy to a 76/24 DeepSeek/Sonnet split under semantic-intent, which is what closed the accuracy gap. Keeping LLM routing as an option for users who want fully dynamic decisions, but the default moves.
  4. Statistical rigor baked into the benchmark harness. The follow-up at 55 seeds (vs 10 in the original run) is now the standard campaign shape. 10 seeds of n=20 gave roughly 80% power to detect a ~7.7 pp gap, which is too coarse for honest claims on small deltas.

What I did not fix and why

Opus 4.6 as an always-on ceiling is still more accurate than any routed configuration on a handful of MMLU subjects (graduate-level physics, professional law). I am not pretending routing beats Opus on the hardest slice of the distribution. The pitch is that most production traffic is not that slice, and the savings on the rest pay for the few calls where you still want to hit Opus directly.

Release

0.2.0 drops in the next few days. I will post a second update with the 55-seed numbers and the rebuilt SQL pool results as soon as the campaign is complete. Expect the data to either confirm the first round or embarrass me publicly, which is the point of running it.

Full write-up of the first campaign (metrics, routing distributions, link to the PDF audit) is here: https://blog.routerly.ai/we-ran-200-questions-per-model

If you want to try Routerly on your own workload before 0.2.0 ships, everything else is at routerly.ai. Happy to answer anything in the comments, especially methodology critiques.

reddit.com
u/nurge86 — 23 hours ago

We're doing weekly live coding sessions on our open-source eBPF root cause analysis tool -anyone interested in joining?

Hey everyone!

We've been building an open-source eBPF-based agent for automated root cause analysis and wanted to start opening up the development process to the community.

We're thinking of doing weekly live coding sessions where we work through the codebase together - debugging, building features, discussing architecture decisions in real time.

Has anyone done something similar with their open-source project? Would love to know what worked. And if anyone's curious to join, happy to share the details in the comments.

reddit.com
u/Epifyse — 24 hours ago
Week