u/Kharki_Lirov — reddlx

Hey everyone,

Over the last few days I’ve been working on a small but functional deep learning framework called **MotifCL** — built from scratch on pure OpenCL + C++17, specifically targeting legacy AMD cards (Polaris and similar) where ROCm is dead or painful.

**Current features:**
- Eager autograd
- Register-blocked matmul with auto-tuning
- Tiled FlashAttention (forward + backward)
- Full masked GQA/MQA support + KV-cache inference
- Quantization (Q4_0, Q8_0, mixed)
- Python bindings
- Modern GPT-style model with GQA

**Results on RX 580 8GB:**
- 10.57M GPT (legacy) → **~89 ms/step** (~1440 tokens/sec) on Shakespeare
- Modern Transformer forward (seq=128) → ~32 ms
- 1-token decode → ~116 tok/s

Interesting finding: **FP16 is often slower than FP32** on Polaris.

---

Questions for the community:

Are any of you still training or running LLMs on Polaris/Vega cards in 2026? How’s your experience?
Is it worth continuing development of an OpenCL-based framework like this, or is it a dead end?
What features would you want most in such a project?
Any specific OpenCL/Polaris quirks, bugs, or optimization tricks I should know about?

Would really appreciate any feedback, criticism, or ideas.

Repo: https://github.com/kharkilirov1/MotifCL

(Project is literally a few days old, so it's still rough but actively developing)

u/Kharki_Lirov — 9 days ago

▲ 1 r/LocalLLM+1 crossposts

Hey everyone,

**Results on RX 580 8GB:**
- 10.57M GPT (legacy) → **~89 ms/step** (~1440 tokens/sec) on Shakespeare
- Modern Transformer forward (seq=128) → ~32 ms
- 1-token decode → ~116 tok/s

Interesting finding: **FP16 is often slower than FP32** on Polaris.

---

Questions for the community:

Are any of you still training or running LLMs on Polaris/Vega cards in 2026? How’s your experience?
Is it worth continuing development of an OpenCL-based framework like this, or is it a dead end?
What features would you want most in such a project?
Any specific OpenCL/Polaris quirks, bugs, or optimization tricks I should know about?

Would really appreciate any feedback, criticism, or ideas.

Repo: https://github.com/kharkilirov1/MotifCL

(Project is literally a few days old, so it's still rough but actively developing)

reddit.com

u/Kharki_Lirov — 8 days ago

▲ 3 r/OpenSourceeAI+2 crossposts

Hi, I open-sourced a project I’ve been building: Cognitive Project Layer (CPL).

The problem: coding agents often spend too much time rediscovering project structure through
blind grep / repeated file reads.

CPL gives agents a local, inspectable context layer:

- project skeleton / entry points / configs
- symbol and reference index
- graph-aware retrieval
- confidence scoring + fallback plans
- CLI, MCP stdio, and local HTTP API
- eval fixtures and benchmarks

It’s written in Rust and Apache-2.0.

Warm MCP retrieval on local fixtures is ~15–30ms after the initial layer build.

Repo:
https://github.com/kharkilirov1/cognitive-project-layer

I’d appreciate feedback from people building coding agents, MCP tools, or code-search/dev-
tooling systems.

u/Kharki_Lirov — 11 days ago

▲ 7 r/OpenSourceeAI+5 crossposts

Hi everyone,

I’m an independent researcher and I’m looking for feedback on a preliminary ML paper I recently published.

It is about structure-preserving adaptation of pretrained Transformer models through exact factorization of selected modules and small trainable updates.

I would appreciate any comments on the idea, experiments, writing, or limitations.

I’m also looking for an arXiv cs.LG endorsement if anyone is willing to help.

Paper / files: https://zenodo.org/records/19839389

Code: https://github.com/kharkilirov1/motif\_upcycling

Thank you.

zenodo.org

u/Kharki_Lirov — 16 days ago