r/MachineLearningAndAI

▲ 9 r/MachineLearningAndAI+2 crossposts

Looking for pros and students to test a 100% offline annotation tool (Runs on 2015 hardware) [p]

I'm tired of web platforms forcing us to upload everything to the cloud. So, I built LensLaber, an offline-first computer vision annotation tool.

I developed the whole thing on my everyday laptop: an old 2015 Asus X550LD (i5, 8GB RAM, 120GB SSD). I wanted to prove you don't need an expensive GPU workstation for AI labeling. By optimizing the architecture, YOLO + MobileSAM run locally on standard CPU using just 600-900MB of RAM. You just bring your own YOLO weights in ONNX format.

Harry Ratcliffe (Applied AI Ecosystem Leader) reviewed the architecture and was mostly surprised by how smoothly both models run on this 2015 hardware. He validated that this local, low-RAM setup is exactly what high-security sectors like medical imaging or manufacturing actually need.

Now I need honest testing, not casual "looks nice" comments. Whether you're a professional managing sensitive enterprise data or a student working on a class project, I need people to actually run real datasets through it. That’s the only way to see how the UI handles real-world friction.

Your time matters. If you provide active feedback during this beta, I’ll give you a lifetime free license for the final release.

Download the beta here:https://lenslaber.github.io

I'll be hanging around the comments to fix any bugs you find. Let me know your thoughts.

u/LensLaber — 4 hours ago
▲ 35 r/MachineLearningAndAI+12 crossposts

Machine Learning Concepts [D]

Dear Folks, I have created multiple content on Machine Learning(work in progress), and they are free. I am a data scientist and a post grad degree holder in AI/ML from IIT. To help the machine learning community with important Machine Learning Concepts, I have created multiple long form videos, and structured topicwise digestible contents structured as playlists for learning.

If you go through the first two playlists:

Introductory Machine Learning Concepts
Probability Foundations: Univariate Models

You might find helpful content, I have tried explaining with intuitions, derivations, and this is work in progress. For code implementations, scikit learn website has great content on them as well. In total they have 60+ topicwise videos so far, and I think they have the potential to help folks a lot in starting with concepts, or getting with mathematical concepts, or whether you are preparing for an AI/ML/Data job interviews etc.

When I sat for my interviews, I was grilled on my project, but majority of questions from my project tested more on foundational concepts and there know how’s.

These are FREE content on youtube. This is for the benefit of the learning community.

Link: https://youtube.com/@aayushsugandh4036?si=w5MKORU2fWzLRrAJ

u/Negative_War_65 — 1 day ago
▲ 59 r/MachineLearningAndAI+31 crossposts

Open-source CLI for red-teaming LLM agents before they touch tools and memory

Sharing RedThread, an open-source CLI for AI red-team campaigns:

https://github.com/matheusht/redthread

The angle is AI agents as an attack surface. Prompt injection gets more interesting once the model can call tools, delegate to workers, write memory, retry failed actions, or propose guardrail changes.

RedThread is built for staging/internal targets. It runs LLM red-team campaigns, records traces, scores failures, and can replay exploit and benign cases before treating a defense as evidence.

Current pieces:

  • PAIR, TAP, Crescendo, and GS-MCTS attack flows
  • JudgeAgent/rubric scoring
  • replay-backed defense proposals
  • telemetry/drift signals
  • agentic checks for tool poisoning, confused deputy paths, canary propagation, and budget amplification

It is not a magic prompt shield and not broad production enforcement.

Looking for people who test agent workflows and can suggest realistic failure cases or target adapters.

u/Apprehensive-Zone148 — 9 days ago
▲ 12 r/MachineLearningAndAI+1 crossposts

MindTrial: GLM 5.2 is ~6x faster than GLM 5.1, but slightly lower on strict score

Added Z.AI GLM 5.2 to my MindTrial leaderboard.

Result on the 39 text-only tasks:

  • GLM 4.7: 13/39
  • GLM 5: 27/39
  • GLM 5.1: 32/39
  • GLM 5.2: 30/39

So GLM 5.2 does not beat GLM 5.1 on strict raw score. It finished with 30 passes, 7 failures, and 2 hard errors.

But the speed difference is the interesting part:

  • GLM 5.1: about 4h04m
  • GLM 5.2: about 40m46s

That is roughly a 6x speedup, with only a 2-task drop in raw score.

Manual review also suggests several GLM 5.2 failures were not pure reasoning misses, but rather "output-discipline" issues. In multiple cases, the right answer seemed to be present in the output, but not in the exact required final format.

Main takeaway: GLM 5.2 looks like a major practical speed upgrade over GLM 5.1, but not a clean strict-benchmark upgrade on MindTrial text reasoning.

petmal.net
u/Correct_Tomato1871 — 7 days ago
▲ 10 r/MachineLearningAndAI+7 crossposts

Multi-Agent Self-Correction Failure Modes & Context Window Inflation — Traced Completely By Hand (No Wrapper Frameworks)

Hey,

We’ve all seen the tutorials preaching the power of Worker-Critic multi-agent setups. But in production, without strict deterministic bounds, you hit a massive architectural wall: The Infinite Hallucination Trap.

If your agents are stuck optimizing for competing constraints, they can easily enter an endless reflection loop—burning tokens, inflating your context window, and running up insane API bills.

To understand exactly why this happens under the hood, I spent this weekend breaking down a dual-agent debugging loop entirely BY HAND using pencil, paper, and state error matrices. No LangChain, no framework fluff—just raw token mechanics.

Here is the breakdown of the first-principles tracing exercise I put together for Workbook 4 of my engineering series:

  1. THE SCENARIO

We track an automated multi-agent patch system trying to fix a legacy multi-threaded bug under two conflicting constraints:

- Constraint A: Eliminate a memory leak (No dangling pointers)

- Constraint B: Maintain thread safety (No race conditions)

  1. THE SYSTEM MATRIX DISCOVERY

- At t=1: The Worker generates Patch_v1. Leak resolved, but thread safety is broken (E_thread = 4).

- At t=2: The Critic catches the error. The Worker over-corrects with a heavy global mutex, shifting the stack allocation frame. Thread safety is fixed, but the leak is completely re-introduced (E_leak = 4).

- At t=3: The Worker panics, strips the mutex, rolls back to a version of Patch_v1, and the system resets back to the exact numerical state of t=1.

  1. THE MATHEMATICAL TRAP

By tracking the progress delta (Delta E = |E_t - E_{t-2}|), we can mathematically prove when the system hits a dead stop. At step t=3, Delta E drops to an absolute 0.0, yet the overall system error remains stuck at E_t = 4.

The agentic system’s velocity collapses to zero before reaching a valid production state. It’s trapped in a perfect, non-converging limit cycle error orbit.

  1. THE BARE-METAL CIRCUIT BREAKER

To solve this without throwing generic execution exceptions, I mapped out a deterministic Circuit Breaker Gate in raw Python that checks this exact zero-velocity threshold and freezes the system state matrix natively before the API call chain loops infinitely.

I’ve uploaded a full walkthrough article including the raw Python simulation code, a solved reference matrix, and an empty workbook PDF if you want to work through the token tracking math at your own lab bench.

I'd love to hear how you guys are natively catching non-convergence in your agent architectures!

👇 [Link to the Full Substack Breakdown & Free Workbook PDF in the Comments]

https://open.substack.com/pub/ayushmansaini/p/inside-the-infinite-hallucination?r=4zl69k&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

u/ParsleyMaximum1702 — 8 days ago

Looking for Programming buddies

Hey everyone I have made a group for programming folks to learn, grow and connect with each other

From beginners to advanced We help each other and provide guidance to everyone in our community, you can also network with each other

Those who are interested are free to dm me anytime

I will also drop the link in comments

reddit.com
u/MAJESTIC-728 — 9 days ago
▲ 4 r/MachineLearningAndAI+1 crossposts

Need Help in Creating an ML model for predicting stock prices using Nifty-50 historical data

Hello everyone,

I was working on an open summer project from one of my college's society.
The task is to create three modules using the following data:

Stock Predictor, Risk assessment module and portfolio builder.

https://www.kaggle.com/datasets/rohanrao/nifty50-stock-market-data/

(The data consists OHLCV values for the top 50 companies in NSE from Jan 2000, to Apr 2021)

My dilemma is how does the project want my predictor to work,
Whether I train the model uptill Apr 2021 and then the user will input the no. of days after which the forecast is required, but then I will not have any data to test my model and find out the evaulation metrics like MAE, RMSE, R^2 score etc. which is required by the Problem Statement.

or do I split the data and then the user is automatically in the timeline uptill where I trained the data, say, Dec 2018 (the rest will be used for testing).

Any suggestions will be highly appreciated.

P.S. - I have also attached the pdf of the PS

https://drive.google.com/file/d/1Zzfz5_0Rwi79MkZ7Ba5H8oE5U7xiRMHq/

reddit.com
u/Prakhar-on-reddit — 12 days ago
▲ 13 r/MachineLearningAndAI+3 crossposts

Claude Fable 5 (Mythos) lands near the top of MindTrial — 80/98 with zero hard errors

Added Anthropic Claude Fable 5 to my MindTrial leaderboard.

This is a strong Anthropic update:

  • Claude Fable 5: 80/98 overall, 0 hard errors
  • Claude 4.8 Opus: 73/98 overall, 5 hard errors
  • Text tasks: Fable hit 39/39, vs 35/39 for Opus 4.8
  • Runtime improved a lot too: ~3.02h for Fable vs ~5.03h for Opus 4.8

It lands right in the top tier of the 98-task board:

  • GPT-5.5: 86/98
  • Gemini 3.1 Pro: 81/98
  • Claude Fable 5: 80/98
  • GPT-5.4: 80/98
  • Gemini 3.5 Flash: 77/98
  • Claude 4.8 Opus: 73/98

The interesting caveat: Fable did not clearly improve the newer visual2 subset. It scored 17/26 there, slightly below Opus 4.8 at 18/26 and well below GPT-5.5 / Gemini 3.1 Pro at 22/26.

Tool use looked cleaner overall: fewer Python calls than Opus 4.8 and fewer 10-call cap hits. So the main gain seems to be reliability, speed, text performance, and original visual tasks — not a clean sweep on the hardest new visual2 tasks.

Main takeaway: Claude Fable 5 is a real Anthropic leap in MindTrial, but not the new overall leader.

petmal.net
u/Correct_Tomato1871 — 13 days ago