u/jacek2023

Nemotron-Labs-Diffusion from NVIDIA

Nemotron-Labs-Diffusion from NVIDIA

Model Overview

Nemotron-Labs-Diffusion is a tri-mode language model that supports both AR decoding and diffusion-based parallel decoding by simply switching the attention pattern of the same model during inference. The synergy between these two modes enables a third mode, called self-speculation: the same model performs diffusion-based parallel drafting and AR verification with shared KV cache, achieving high acceptance lengths and decoding efficiency. The seamless mode switching by simply changing attention patterns enables high efficiency at different concurrency levels in varying deployment scenarios with one single model.

https://preview.redd.it/mwyq7b7hx42h1.png?width=3915&format=png&auto=webp&s=744bd87267338a6236269a8d915b185cff8a82d2

Highlights

  • SOTA 3B, 8B, 14B dense LM family (base, instruct, and vision-language variants) supporting AR, diffusion, and self-speculation with the focus on decode efficiency.
  • Generation moved from a memory-bound regime toward a compute-bound regime. Model weights are loaded once and reused to compute multiple tokens during generation.
  • Self-speculation uses diffusion for drafting and AR for verification, providing a stronger alternative to MTP approaches:
    • 3x higher acceptance length and 2.2x speed-up vs. Qwen3-8B-Eagle3 in SGLang.
    • 5.9× tokens per forward over Qwen3-8B (no MTP) with the same accuracy.
  • Real-device speed-up across platforms:
    • DGX Spark (8B, concurrency 1): 2.7x faster with 112 tok/sec vs. 41.8 tok/sec AR using w4a16.
    • GB200 (8B, concurrency 1): 3.3x faster with 850 tok/sec vs. 253 tok/sec AR and 360 tok/sec Eagle3. Custom CUDA kernels boost to 1015 tok/sec (4x).
  • Diffusion speedup-of-light analysis shows that throughput can be further doubled (vs. current best) for a single user with better sampling - future research.

https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-VLM-8B

https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-14B-Base

https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-14B

https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-8B-Base

https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-8B

https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-3B-Base

https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-3B

reddit.com
u/jacek2023 — 13 hours ago

A Forest of Stars - Sway, Draped In Vague

the more I listen to the new album by A Forest of Stars the better it sounds

youtu.be
u/jacek2023 — 6 days ago

server, webui: support continue generation on reasoning models by ServeurpersoCom · Pull Request #22727 · ggml-org/llama.cpp

now you can CONTINUE

github.com
u/jacek2023 — 3 days ago

https://huggingface.co/XiaomiMiMo/MiMo-V2.5

Model Summary

  • Architecture: Sparse MoE (Mixture of Experts), 310B total / 15B activated parameters
  • Context Length: Up to 1M tokens
  • Modalities: Text, Image, Video, Audio
  • Vision Encoder: 729M-param ViT (28 layers: 24 SWA + 4 Full)
  • Audio Encoder: 261M-param Audio Transformer (24 layers: 12 SWA + 12 Full)
  • Multi-Token Prediction (MTP): 329M parameters, 3 layers
u/jacek2023 — 13 days ago

Qwen/Qwen3.6-35B-A3B was released 22 days ago

Qwen/Qwen3.6-27B was released 15 days ago

Let's predict when we can expect the 9B and 122B versions

reddit.com
u/jacek2023 — 13 days ago

and they are not happy for some reason

the image shows the search result for: Google Chrome 4GB (it's possible that these articles are bullshit, I just know that some people found it in their Chrome)

u/jacek2023 — 14 days ago
▲ 540 r/povertyLocalLLaMA+1 crossposts

https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF

Mistral Medium 3.5 128B

Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and coding in a single set of weights. Mistral Medium 3.5 replaces its predecessor Mistral Medium 3.1 and Magistral in Le Chat. It also replaces Devstral 2 in our coding agent Vibe. Concretely, expect better performance for instruct, reasoning and coding tasks in a new unified model in comparison with our previous released models.

Reasoning effort is configurable per request, so the same model can answer a quick chat reply or work through a complex agentic run. We trained the vision encoder from scratch to handle variable image sizes and aspect ratios.

Find more information on our blog.

Key Features

Mistral Medium 3.5 includes the following architectural choices:

  • Dense 128B parameters.
  • 256k context length.
  • Multimodal input: Accepts both text and image input, with text output.
  • Instruct and Reasoning functionalities with function calls (reasoning effort configurable per request).

Mistral Medium 3.5 offers the following capabilities:

  • Reasoning Mode: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
  • Vision: Analyzes images and provides insights based on visual content, in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic.
  • System Prompt: Strong adherence and support for system prompts.
  • Agentic: Best-in-class agentic capabilities with native function calling and JSON output.
  • Large Context Window: Supports a 256k context window.

We release this model under a Modified MIT License: Open-source license for both commercial and non-commercial use with exceptions for companies with large revenue.

Recommended Settings

  • Reasoning Effort:
    • 'none' → Do not use reasoning
    • 'high' → Use reasoning (recommended for complex prompts and agentic usage) Use reasoning_effort="high" for complex tasks and agentic coding.
  • Temperature: 0.7 for reasoning_effort="high". Temp between 0.0 and 0.7 for reasoning_effort="none" depending on the task. Generally, lower means answer that are more to the point and higher allows the model to be more creative. It is a good practice to try different values in order to improve the model performance to meet your demands.
u/jacek2023 — 20 days ago

https://forum.blackmagicdesign.com/viewtopic.php?f=21&t=235587&sid=4cddfcec2a8cd658797d0ff9da05b7c1

from forum:

We are pleased to announce the release of DaVinci Resolve Studio 21.0b2 . This release is available at no charge for existing customers from our support web site.

https://www.blackmagicdesign.com/support/family/davinci-resolve-and-fusion

For DaVinci Resolve 21.0b2, we have taken efforts to keep the project libraries compatible with DaVinci Resolve 20.3.2. While this allows you to access the project library with 20.3.2, individual projects created or opened in 21.0b2 will no longer be accessible in 20.3.2. We recommend a full project library backup as well as individual project backups before opening projects in 21.0b2.

What's New in DaVinci Resolve 21.0b2

The following features have been added or updated.

  • More consistent crop flip and rotate actions in the Photo page.
  • Improved handling of crop resolutions for NEF, CR3 and RAF.
  • Nikon NEF lens distortion and lens vignette controls.
  • Speed and quality improvements for IntelliSearch better mode analysis.
  • Improved face identification for IntelliSearch analysis on Windows ARM.
  • Improved default ease profiles for retime curves.
  • Keyframe editor uses visible clip extents for curve normalization.
  • Color font and emoji rendering in Fusion page is RCM aware.
  • Foveated rendering controls for Apple Vision Pro workflows.
  • Support for dragging multiple OGraf or Lottie assets into a timeline.
  • Action to switch to source viewer mode in the Photo page.
  • Double click effects to apply to current photo.
  • Photo album quick export now uses output color space.
  • Looks section in viewer tools in the bottom toolbar.
  • Navigate photo albums with left and right arrow key in Color page.
  • Trim pass render tag can be used in file name and path in the Deliver page.
  • Improved HEIC thumbnail refresh for photo albums on Windows and Linux.
  • Multiple Macro Editor improvements.
  • Address issue with exporting stills with original file name.
  • Address decode issues with some RAF and 12-bit NEF stills.
  • Address incorrect UltraNR noise profiles for images.
  • Address fine audio volume adjustment using shift key.
  • Address auto select controls not working for source timelines.
  • Address issue with world pose for Apple Vision Pro workflows.
  • Address stretched text issue with VR360 titles and subtitles.
  • Address black frames in visionOS Review output with MainConcept MV-HEVC encoder.
  • Address edge pixel artifacts in immersive video clips added to VR180 or VR360 timelines.
  • Improved default node positioning for imported and generated nodes.
  • Address multiple issues with Cryptomatte and 3d renderer.
  • Scripting API support to classify audio for media pool clips and folders.
  • Scripting API support for speaker detection in audio transcription.
  • General performance and stability improvements.

Minimum System Requirements for macOS

  • macOS 15 Sequoia or later.
  • 8 GB of system memory or 16 GB when using Fusion.
  • At least 16 GB for advanced AI tools.
  • At least 32 GB for background rendering and analysis.
  • For monitoring, Blackmagic Design Desktop Video 12.9 or later.
  • Apple Silicon based computer.

Minimum System Requirements for Windows

  • Windows 10 Creators Update.
  • 16 GB of system memory or 32 GB when using Fusion.
  • For monitoring, Blackmagic Design Desktop Video 12.9 or later.
  • Integrated GPU or discrete GPU with at least 4 GB of VRAM.
  • At least 16 GB VRAM for advanced AI tools.
  • At least 32 GB RAM and 12 GB VRAM for background rendering.
  • GPU which supports OpenCL 1.2 or CUDA 12.8.
  • AMD/Intel official drivers from your GPU manufacturer.
  • NVIDIA Studio driver 581.57 or newer.

Minimum System Requirements for Windows for Arm

  • Windows 11 for ARM.
  • Qualcomm Snapdragon X Elite series processor.
  • 16 GB of system memory or 32 GB for 4K or when using Fusion.

Minimum System Requirements for Linux

  • Rocky Linux 8.6.
  • 32 GB of system memory.
  • For monitoring, Blackmagic Design Desktop Video 12.9 or later.
  • Discrete GPU with at least 4 GB of VRAM.
  • At least 16 GB VRAM for advanced AI tools.
  • At least 32 GB RAM and 12 GB VRAM for background rendering.
  • GPU which supports OpenCL 1.2 or CUDA 12.8.
  • AMD official drivers from your GPU manufacturer.
  • NVIDIA Studio driver 580.119.02 or newer.

Rohit Gupta

DaVinci Resolve Software Development
Blackmagic Design

reddit.com
u/jacek2023 — 22 days ago

https://huggingface.co/ggml-org/NVIDIA-Nemotron-3-Nano-Omni

NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows. It extends the Nemotron Nano family with integrated video+speech comprehension, Graphical User Interface (GUI), Optical Character Recognition (OCR), and speech transcription capabilities, enabling end-to-end processing of rich enterprise content such as meeting recordings, M&E assets, training videos, and complex business documents. NVIDIA Nemotron 3 Nano Omni was developed by NVIDIA as part of the Nemotron model family.

This model is available for commercial use.

This model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b. For more information, please see the Training Dataset section below.

u/jacek2023 — 22 days ago

Tutorial from the Google guy,

I use very similar setup (llama.cpp instead of lmstudio)

u/jacek2023 — 23 days ago