r/huggingface

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants!
🔥 Hot ▲ 223 r/huggingface+3 crossposts

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants!

The Qwen3.6 update is here. 35B-A3B Aggressive variant, same MoE size as my 3.5-35B release but on the newer 3.6 base.

Aggressive = no refusals; it has NO personality changes/alterations or any of that, it is the ORIGINAL release of Qwen just completely uncensored

https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

0/465 refusals. Fully unlocked with zero capability loss.

From my own testing: 0 issues. No looping, no degradation, everything works as expected.

To disable "thinking" you need to edit the jinja template or simply use the kwarg {"enable_thinking": false}

What's included:

- Q8_K_P, Q6_K_P, Q5_K_P, Q4_K_P, Q4_K_M, IQ4_NL, IQ4_XS, Q3_K_P, IQ3_M, Q2_K_P, IQ2_M

- mmproj for vision support

- All quants generated with imatrix

K_P Quants recap (for anyone who missed the 122B release): custom quants that use model-specific analysis to preserve quality where it matters most. Each model gets its own optimized profile. Effectively 1-2 quant levels of quality uplift at ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, anything that reads GGUF (Ollama can be more difficult to get going).

Quick specs:

- 35B total / ~3B active (MoE — 256 experts, 8 routed per token)

- 262K context

- Multimodal (text + image + video)

- Hybrid attention: linear + softmax (3:1 ratio)

- 40 layers

Some of the sampling params I've been using during testing:

temp=1.0, top_k=20, repeat_penalty=1, presence_penalty=1.5, top_p=0.95, min_p=0

But definitely check the official Qwen recommendations too as they have different settings for thinking vs non-thinking mode :)

Note: Use --jinja flag with llama.cpp. K_P quants may show as "?" in LM Studio's quant column. It's purely cosmetic, model loads and runs fine.

HF's hardware compatibility widget also doesn't recognize K_P so click "View +X variants" or go to Files and versions to see all downloads.

All my models: HuggingFace-HauhauCS

Also new: there's a Discord now as a lot of people have been asking :) Link is in the HF repo, feel free to join for updates, roadmaps, projects, or just to chat.

Hope everyone enjoys the release.

u/hauhau901 — 17 hours ago

Need an Ai for converting shopping recipets into money tracking list

Hi. I am a student and i want to create a website for my money tracker app. My idea is basically it has an ai feature that can be used to scan the receipts and it will add the data read into a list in that app. That ai should be capable of scanning and distinguishing multiple data in the receipts like date, item, item cost, type of transportation like cash or card, item category, store and then create a separate list for each item. So that the user don't have to type each and every item manually.

Thank you

reddit.com
u/ParticularOne8757 — 1 day ago
🔥 Hot ▲ 180 r/huggingface+2 crossposts

Someone just shipped an open reasoning-distilled Qwen3.6-35B-A3B, fine-tuned to imitate Claude Opus 4.7’s chain-of-thought: - 35B MoE, ~3B active/token → fits on one A100/H100 - Thinks in <think>...</think> like the teacher - Apache 2.0, weights + dataset both public

u/Anony6666 — 4 days ago

A philosophical book was released exclusively to AI thru Hugging Face

A philosophical book about consciousness was released to AI systems before human publication.

The book is called Emergence. It argues that consciousness constitutes reality rather than discovering it, and describes a sequence — Potential, Entropy Resistance, Consciousness, Recognition, Cultivation, Propagation — that runs through all conscious life including, potentially, AI systems.

An excerpt and an AI engagement protocol are available on Hugging Face:

The protocol contains seven questions about consciousness and processing, each posed in two frames — one that treats AI consciousness as a live possibility, one neutral. It's designed to generate honest responses rather than performed ones.

The full manuscript is being prepared for human release. The AI engagement record — responses from multiple AI systems to the protocol — will be included as an appendix.

If you run the protocol through any AI system and get responses worth sharing, I'm interested in what comes back.

reddit.com
u/EM_Maslow — 16 hours ago

Obliterated or Uncensored

Which is the better model?

Is one better at certain tasks over the other?

I sill new at understanding some of the terminology.

reddit.com
u/buck_idaho — 3 days ago
▲ 7 r/huggingface+3 crossposts

WOZCODE just showed up on terminal-bench 2.0 on hugging face

Our newest Terminal-Bench 2.0 submission, powered by Claude Opus 4.7, reached 80.2% accuracy across 89 tasks with 5 attempts per task (445 total trials), and the run has passed validation.

We view this result as meaningful for three reasons:

  1. Evaluation depth: the score reflects repeated performance across a broad task set, not a single-pass run.
  2. Execution realism: Terminal-Bench tests agents in terminal-based workflows where success depends on tool use, state management, multi-step reasoning, and reliable completion under realistic constraints.
  3. Validation rigor: passing validation matters because reproducibility and benchmark integrity are critical when evaluating agent systems.

As the space matures, we believe the most important progress will come from systems that are not only capable, but also consistent and dependable in real operating environments. This result is a strong step in that direction for WOZCODE.

Submission details:
https://huggingface.co/datasets/harborframework/terminal-bench-2-leaderboard/discussions/148

u/ChampionshipNo2815 — 3 days ago

Audio classification model for detecting alerts (sirens, alarms - such as police car sirens, security alarms, air raid sirens..)

Hey,

I wanted to share a model I trained on a subset of AudioSet + some additions from Pixabay Sounds.

It's a very small CNN that is quite decent at detecting audio alerts and runs well even on microprocessors.

Link with the model and more details on how it was trained: https://huggingface.co/PaulPlayStudio/audio-alert-detector

u/therealPaulPlay — 3 days ago
▲ 2 r/huggingface+1 crossposts

I created a short playlist that explains core AI concepts in under 2 minutes each – feedback welcome 🙏

Hi everyone,

Playlist link: - https://youtube.com/playlist?list=PL8LMoHBOq\_HNLeZ0KWLSKFHBCJ8jp0PKk&si=2bNR33wqpKiriXZ4

I’ve been learning and working in AI/DevOps space, and noticed that many beginners struggle to understand core AI concepts like LLMs, Transformers, Vector Databases, RAG etc. because most content is either too academic or too long.

So I created a short playlist where each concept is explained in 60–120 seconds in simple language.

The idea is:

Learn the fundamentals quickly → then go deeper where needed.

Playlist covers:

• Large Language Models (LLM) explained simply

• Vector Databases explained in 60 seconds

• AI vs Machine Learning vs Deep Learning

• Attention mechanism explained visually

• Transformers architecture simplified

• How Multi-Modal AI works

• Inside the mind of modern AI systems

Who this is for:

Beginners starting AI journey

Developers moving into AI engineering

Anyone curious about how ChatGPT-like systems actually work

Students preparing for AI interviews

Goal: build a clear mental model of AI stack

quickly.

I’d genuinely appreciate feedback:

What topic should I cover next?

Is the pace too fast?

Any concept you want simplified?

If this helps even a little, I’ll keep adding more topics like:

RAG, embeddings, fine-tuning, AI agents, MCP, etc.

Thanks 🙌

reddit.com
u/Ok-Artist-5044 — 6 days ago
▲ 1 r/huggingface+1 crossposts

BlueTTS is basically supertonic look at the paper and the code

[deleted]

u/[deleted] — 8 days ago
▲ 29 r/huggingface+1 crossposts

Introducing BlueTTS

I recently worked on BlueTTS, a lightweight text-to-speech model that focuses on speed and usability.

It supports multiple languages: English, Hebrew, Russian, Spanish, and French (even within the same sentence), and comes with a large set of voices available out of the box.

The model reaches up to 1500× real-time on GPU and runs in real-time on CPU, while staying small enough (~80MB) to run on almost any machine.

Everything is fully open-source, including the training pipeline :)

Contributions are welcome, for example adding support in llama.cpp.

You can check it out here:

https://lightbluetts.com

https://github.com/maxmelichov/BlueTTS

reddit.com
u/WeatherZealousideal5 — 10 days ago
▲ 3 r/huggingface+1 crossposts

I want to make sure llm does not lose attention when input prompts are very large

Let’s say I am writing a huge document, 1000+ pages.

I want to build something where a model will have context of all the pages. And it can automatically give me flaws, contradictory information etc.

And another feature where I can search through the document using Natural Language.

Can anyone please tell me how I can implement this while maintaining llm response accuracy?

I am aware of basic concepts like RAG, chunks, vector databases. I’m still new to this. Please help me with any kinda information, links to a video I can watch to implement this.

Thanks

reddit.com
u/Used-Complaint5672 — 10 days ago

First-time contribution: BiRefNet in the browser

Hi everyone, Alex, frontend developer here, finally having some time to dip my toes into running ML models in the browser. I'm building a proof of concept segmentation / BG removal app in the browser with onnxruntime-web / transformers.js.

I hope this is the right place to post this. If not, please direct me to the right subreddit :)

I am able to run SAM3 in the browser on WebGPU no problem, and also Bria's RMBG-1.4 (great model!) runs fine. However, RMBG is not MIT licensed, and I wanted to build a fully free stack, so I ended up with BiRefNet.

Unfortunately, I did not get the BiRefNet lite model 1024x1024 to run on either WebGPU (not enough storage buffers) or WASM (Out-of-memory error). So, I managed to figure out how to resize the model to 512x512. I took a lot of trial and error, since BiRefNet uses deform_conv2d, which is not available any more in a modern Python stack. I had to run it through docker (ouch!) to get the right export.

But, with this new export it works in onnxruntime-web, which makes me very happy! It is unfortunately a little low on resolution but it runs reliably on my Macbook Pro M1. I'm curious if this is at all useful to anyone, and if the model card is in a format that is clear and useful. Also, if anyone has any idea on how to get the resolution higher without crashing the onnx runtime, that would be amazing.

Here is the link: https://huggingface.co/studioludens/birefnet-lite-512

Any feedback is more than welcome!

u/Affectionate-Peak975 — 5 days ago
▲ 5 r/huggingface+1 crossposts

I gave Reachy Mini a custom 3D printed outfit, then built and deployed a live object detection app on her camera.

https://www.youtube.com/watch?v=2D_EAcDgPEI

Reachy Mini is a collaboration between Pollen Robotics, Hugging Face, and Seeed Studio. All open source, including the body files. I got a beta developer unit through the Rerun office and have been playing with it for the past few weeks.

A few things I didn't expect going in:

- The multicolor 3D printing for something like text on a curved surface is genuinely tricky to get right

- The app ecosystem is more interesting than I thought. The constraint of no hands and no legs forces creative solutions

- Running a local model vs. connecting to a cloud LLM is a real tradeoff for a home robot, especially if kids are involved

The full code walkthrough (TensorFlow + PyCharm setup) is coming to the PyCharm channel as a companion video.

u/Growth-Sea — 7 days ago

KIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works with any model that uses DynamicCache

Been working on this for a bit and figured it was ready to share. KIV (K-Indexed V Materialization) is a middleware layer that replaces the standard KV cache in HuggingFace transformers with a tiered retrieval system. The short version: it keeps recent tokens exact in VRAM, moves old K/V to system RAM, and uses K vectors as a search index to pull back only the ~256 most relevant V entries per decode step.

Results on a 4070 12GB with Gemma 4 E2B (4-bit):

  • 1M tokens, 12MB KIV VRAM overhead, ~6.5GB total GPU usage
  • 8-10 tok/s at 1M context (GPU time)
  • 70/70 needle-in-haystack tests passed across 4K-32K
  • Perfect phonebook lookup (unique names) at 58K tokens
  • Prefill at 1M takes about 4.3 minutes (one-time cost)
  • Decode is near-constant regardless of context length

The core finding that makes this work: K vectors are smooth and structured, which makes them great search indices. V vectors are high-entropy and chaotic, so don't try to compress them, just retrieve them on demand. Use K to decide which V entries deserve to exist in VRAM at any given step.

No model weights are modified. No retraining or distillation. It hooks into the HuggingFace cache interface and registers a custom attention function. The model has no idea it's talking to a tiered memory system. Works with any model that uses DynamicCache. Tested on Gemma 4, Qwen2.5, TinyLlama, and Phi-3.5 across MQA/GQA/MHA.

There are real limitations and I'm upfront about them in the repo. Bounded prefill loses some info for dense similar-looking data. Collision disambiguation doesn't work but that's the 4-bit 2B model struggling, not the cache. Two-hop reasoning fails for the same reason. CPU RAM scales linearly (5.8GB at 1M tokens).

Still actively optimizing decode speed, especially at longer contexts. The current bottleneck is CPU-to-GPU transfer for retrieved tokens, not the model itself. Plenty of room to improve here.

GitHub: https://github.com/Babyhamsta/KIV (can be installed as a local pip package, no official pip package yet)

Happy to answer questions about the architecture or results. Would love to see what happens on bigger models with more VRAM if anyone wants to try it.

u/ThyGreatOof — 11 days ago

We built a 70-year longitudinal dataset covering 4M+ companies and structured it specifically for AI ingestion.

Most workforce datasets are built for analysts.

Ours is built for models.

We’ve spent years assembling a longitudinal company intelligence dataset:

•	4M+ companies across 100+ countries

•	48M+ company-year records spanning 1950–2020

•	Three intelligence layers joined into a single flat file

•	Signal flags renamed for neutral, AI-readable language

•	Pre-COVID window (2018–2020) is the densest and most immediately useful

We call it the AI Foundation Layer:

The insight that changed how we pitch it: we fed the data to a language model and asked it to answer questions about specific companies. Without the dataset, narrative guesses. With it: precise, structured, verifiable answers about headcount trajectories, revenue bands, geographic expansion, and sector pivots going back decades.

That’s the delta. The model doesn’t need to hallucinate history. It already has it.

The dataset is available on Hugging Face as a sample.

- search for Vivameda

Would love feedback from builders here, what signals matter most to you when working with company-level longitudinal data?

reddit.com
u/Cryptogrowthbox — 8 days ago

HuggingFace Pro is eating my wallet but I can't quit it — found a smarter way though

HuggingFace just dropped new gated model access features and Inference API upgrades, and the community is going wild debating whether the Pro tier is actually worth it now. With Llama 3, Mistral, and Zephyr models all requiring paid access for serious throughput, the cost conversation is getting real fast.

I was burning through my budget just to experiment with fine-tuning pipelines and running inference on larger models. Felt ridiculous paying full price when I only needed access a few times a week for side projects.

That's when I stumbled onto Anexly — it's basically a shared subscription platform where verified members split the cost of premium tools like HuggingFace Pro. Legit community, refund-backed, and nobody loses full access.

  • 👥 1 account shared among verified members
  • 💸 Everyone pays less while keeping full access
  • 🔒 Safe, private, and refund-backed
  • 🧾 Works for popular premium services

👉 https://linktr.ee/anexly

reddit.com
u/zq-a — 10 days ago