u/44th--Hokage

🔥 Hot ▲ 62 r/accelerate

Terence Tao Adopts A 'Copernican View Of Intelligence' Fits Better — Just As Earth Is Not The Center Of The Universe, Human Intelligence Is Not The Center Of All Cognition

u/44th--Hokage — 2 hours ago

Factory Workers Have Started To Wear Cameras On Their Heads To Film Their Motions To Train Robots On Their Assembly Workflow

u/44th--Hokage — 3 hours ago

Emad Mostaque: "Civil Unrest May Come Before The Job Loss. Lawmakers Are Already Getting Death Threats For Approving Data Centers."

u/44th--Hokage — 3 hours ago

Schmidhuber & Meta AI Present The "Neural Computer": A New Frontier Where Computation, Memory, And I/O Move Into A Learned Runtime State

##TL;DR:

Conventional computers execute explicit programs. Agents act over external environments. World models learn environment dynamics. Neural Computers (NCs) ask whether some of runtime itself can move into the learning system.


##Abstract: >We propose a new frontier: Neural Computers (NCs) -- an emerging machine form that unifies computation, memory, and I/O in a learned runtime state. Unlike conventional computers, which execute explicit programs, agents, which act over external execution environments, and world models, which learn environment dynamics, NCs aim to make the model itself the running computer. > >Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine form, with stable execution, explicit reprogramming, and durable capability reuse. As an initial step, we study whether early NC primitives can be learned solely from collected I/O traces, without instrumented program state. Concretely, we instantiate NCs as video models that roll out screen frames from instructions, pixels, and user actions (when available) in CLI and GUI settings. > >These implementations show that learned runtimes can acquire early interface primitives, especially I/O alignment and short-horizon control, while routine reuse, controlled updates, and symbolic stability remain open. We outline a roadmap toward CNCs around these challenges. If overcome, CNCs could establish a new computing paradigm beyond today's agents, world models, and conventional computers.


##Layman's Explanation:

A "Neural Computer" is built by adapting video generation architectures to train a World Model of an actual computer that can directly simulate a computer interface. Instead of interacting with a real operating system, these models can take in user actions like keystrokes and mouse clicks alongside previous screen pixels to predict and generate the next video frames. Trained solely on recorded input and output traces, it successfully learned to render readable text and control a cursor, proving that a neural network can run as its own visual computing environment without a traditional operating system.


######Link to the Paper: https://arxiv.org/pdf/2604.06425


######Link to the GitHub: https://github.com/metauto-ai/NeuralComputer


######Link to the Official Blogpost: https://metauto.ai/neuralcomputer/

u/44th--Hokage — 11 hours ago

Schmidhuber & Meta AI Present The "Neural Computer": A New Frontier Where Computation, Memory, And I/O Move Into A Learned Runtime State.

##TL;DR:

Conventional computers execute explicit programs. Agents act over external environments. World models learn environment dynamics. Neural Computers (NCs) ask whether some of runtime itself can move into the learning system.


##Abstract: >We propose a new frontier: Neural Computers (NCs) -- an emerging machine form that unifies computation, memory, and I/O in a learned runtime state. Unlike conventional computers, which execute explicit programs, agents, which act over external execution environments, and world models, which learn environment dynamics, NCs aim to make the model itself the running computer. > >Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine form, with stable execution, explicit reprogramming, and durable capability reuse. As an initial step, we study whether early NC primitives can be learned solely from collected I/O traces, without instrumented program state. Concretely, we instantiate NCs as video models that roll out screen frames from instructions, pixels, and user actions (when available) in CLI and GUI settings. > >These implementations show that learned runtimes can acquire early interface primitives, especially I/O alignment and short-horizon control, while routine reuse, controlled updates, and symbolic stability remain open. We outline a roadmap toward CNCs around these challenges. If overcome, CNCs could establish a new computing paradigm beyond today's agents, world models, and conventional computers.


##Layman's Explanation:

A "Neural Computer" is built by adapting video generation architectures to train a World Model of an actual computer that can directly simulate a computer interface. Instead of interacting with a real operating system, these models can take in user actions like keystrokes and mouse clicks alongside previous screen pixels to predict and generate the next video frames. Trained solely on recorded input and output traces, it successfully learned to render readable text and control a cursor, proving that a neural network can run as its own visual computing environment without a traditional operating system.


######Link to the Paper: https://arxiv.org/pdf/2604.06425


######Link to the GitHub: https://github.com/metauto-ai/NeuralComputer


######Link to the Official Blogpost: https://metauto.ai/neuralcomputer/

u/44th--Hokage — 11 hours ago
🔥 Hot ▲ 172 r/accelerate

Demis Hassabis Believes AI Should Spread Gains Through Broad Ownership, Like Pension Or Sovereign Funds Investing In AI. If AI-Driven Productivity Gains Cluster At The Top, Redistribution Must Widen The Benefits.

u/44th--Hokage — 19 hours ago
🔥 Hot ▲ 245 r/LocalLLaMA+1 crossposts

National University of Singapore Presents "DMax": A New Paradigm For Diffusion Language Models (dLLMs) Enabling Aggressive Parallel Decoding.

##TL;DR:

DMax cleverly mitigates error accumulation by reforming decoding as a progressive self-refinement process, allowing the model to correct its own erroneous predictions during generation.


##Abstract:

>We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. > >At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further propose Soft Parallel Decoding. We represent each intermediate decoding state as an interpolation between the predicted token embedding and the mask embedding, enabling iterative self-revising in embedding space. > >Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax. Compared with the original LLaDA-2.0-mini, our method improves TPF on GSM8K from 2.04 to 5.47 while preserving accuracy. On MBPP, it increases TPF from 2.71 to 5.86 while maintaining comparable performance. On two H200 GPUs, our model achieves an average of 1,338 TPS at batch size 1.


##Layman's Explanation:

The core idea is that diffusion language models should be able to generate text faster than normal LLMs because they can fill in multiple tokens at the same time. In practice, though, that speed advantage gets limited because early wrong guesses tend to snowball. Once the model commits to a bad token, that bad token becomes part of the context for the next step, so quality can fall apart fast when decoding gets too aggressive. What DMax does is give the model a better way to recover from its own mistakes. Instead of moving in a rigid one-way path from masked slots to final tokens, it lets the model keep refining intermediate guesses before locking them in.

The paper’s two main ideas are pretty intuitive. First, the model is trained on its own imperfect predictions, so it learns how to clean up the kinds of errors it will actually make at inference time. Second, during decoding it uses a softer in-between representation rather than treating every guess as fully final right away, which helps preserve uncertainty and makes revision easier. The result is that DMax pushes much more parallel decoding without the usual collapse in quality. On the paper’s math and coding benchmarks, it gets large speedups while keeping accuracy close to the original model, and in some lower-parallel settings it even improves accuracy a bit. So the main takeaway is not just “faster diffusion LLMs,” but diffusion LLMs that can revise themselves well enough to make aggressive parallel decoding actually practical.


######Link to the Paper: https://arxiv.org/pdf/2604.08302


######Link to the GitHub: https://github.com/czg1225/DMax


######Link to the Models: https://huggingface.co/collections/Zigeng/dmax-models


######Link to the Training Dataset: https://huggingface.co/collections/Zigeng/dmax-training-data

u/44th--Hokage — 1 day ago

What Happened When Startup Axiom Math's AI Took The Putnam Competition? At 3:58PM, 2-Minutes Before The Deadline, Axiom Solved Its 8th Problem. CEO Carina Hong Really Wanted 9: "So The Next Morning, I Just Did The Right Thing, I Announced It On Twitter. Two Hours Later, We Got The 9th."

##The Putnam Story:

>"We got the exam, and we are like, 'We are screwed,' CEO Carina Hong remembers of that December day. > >Famed mathematician Ken Ono, who now works at Axiom, rallied the troops: There's no room for purity! We are in a sports situation! > >"It's really funny for a pure number theorist to say that," says Hong. > >At 3:58pm, two minutes before the deadline, Axiom solved its 8th problem. Hong really wanted 9. She debated whether to announce the news. > >"So the next morning, I just did the right thing, I announced it on Twitter. Two hours later, we got the ninth."


##Synopsis:

A year ago, Morgan Prize-winning math prodigy Carina Hong was working on her PhD at Stanford. Now, she runs startup Axiom Math, a math AI startup valued at $1.6 billion. Axiom has already solved complex math problems, and Hong hopes it can help researchers solve more. But her ambitions go beyond that: to solve AI code's 'slop' problem by running math-based verification in real-time.

On The Upstarts Podcast, Hong shares her founder journey from immigrant to MIT and Stanford; why most perceptions of code verification are wrong, and how she’s learning on the job to help Axiom compete in a red-hot new category. Plus, she shares her Upstart Moment as Axiom's tools took on the world's hardest college-level math test.


####Links to the Full Interview:

######YouTube: https://www.youtube.com/watch?v=I6RdTGvdbuM


######Spotify: https://open.spotify.com/episode/274av40lepgFI8xI3mhL8s?si=wJoMmnJqQfalTJKQQaGAhw&t=0&pi=svdIicThT7igh


######Apple: https://podcasts.apple.com/us/podcast/axioms-carina-hong-solving-maths-hardest-problems-with/id1875709419?i=1000758833957

u/44th--Hokage — 1 day ago
🔥 Hot ▲ 151 r/accelerate

DISCUSSION: Anthropic Has Internal "Mythos". OpenAI Has Internal "Spud". Elon Says xAI Is Training 6t And 10t Models. What Do You Think Google Has Internally?

u/44th--Hokage — 1 day ago
🔥 Hot ▲ 76 r/accelerate

Starting Next Week Unitree Will Start Selling Its Cheapest Humanoid Robot, The 123-Cm-Tall, 27-Kg "R1 Humanoid Robot", For $5,900 On Alibaba’s Aliexpress For International Markets Incl. North America, Europe, & Japan

u/44th--Hokage — 1 day ago
🔥 Hot ▲ 84 r/LocalLLaMA+1 crossposts

[Oldie-But-A-Goodie] META Presents "TRIBE v2": A Next-Gen Model That Acts As A Digital Twin Of Human Neural Activity

##TL;DR:

META's New AI Can Predict Your Brain Better Than A Brain Scan.


##Abstract:

>Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions. > >Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-resolution brain responses for novel stimuli, tasks and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy. > > >Critically, TRIBE v2 enables in silico experimentation: tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. Finally, by extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration. > >These results establish artificial intelligence as a unifying framework for exploring the functional organization of the human brain.


##Layman's Explanation:

TRIBE v2 is a foundation model trained on 1,000+ hours of brain imaging data from 720 people. You feed it a video, sound clip, or text, and it predicts:

  • Which brain regions light up

  • How strongly

  • And in what order

When tested on people it had never seen, the model's predictions were actually more accurate than most real brain scans (which get distorted by heartbeats, breathing, and movement). Researchers then replicated decades of classic neuroscience experiments entirely inside the software.

No scanner, no human subjects.

The model correctly identified the brain's face recognition center, language network, and emotional processing regions on its own.

####My Thoughts:

Look at what else Meta has been building:

  • Ray-Ban smart glasses that see and hear what you do

  • A wristband that reads nerve signals

  • And now a model that predicts how your brain responds to any piece of content

There's no evidence these are all connected, however regardless Meta now has a complete picture of attention, from the stimulus to the neural response.


######Link to the Paper: https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/


######Link to the GitHub: https://github.com/facebookresearch/tribev2


######Link to the Open-Sourced Weights: https://huggingface.co/facebook/tribev2

u/44th--Hokage — 1 day ago

MIT Presents "Exponential Quantum Advantage In Processing Massive Classical Data": Small Quantum Computers Beat Exponentially Larger Classical Machines

####TL;DR:

Our results provide strong evidence that the quantum method gets strong performance with fewer than 60 logical qubits and shows four to six orders of magnitude smaller machine size than the classical and QRAM-style baselines on the main real-world datasets. Rather than fearing that classical AI will “eat quantum computing’s lunch,” we now have rigorous evidence pointing towards a much more exciting prospect: quantum-enhanced AI overpowering classical AI.


####Abstract:

>Broadly applicable quantum advantage, particularly in classical data processing and machine learning, has been a fundamental open problem. In this work, we prove that a small quantum computer of polylogarithmic size can perform large-scale classification and dimension reduction on massive classical data by processing samples on the fly, whereas any classical machine achieving the same prediction performance requires exponentially larger size. Furthermore, classical machines that are exponentially larger yet below the required size need superpolynomially more samples and time. > >We validate these quantum advantages in real-world applications, including single-cell RNA sequencing and movie review sentiment analysis, demonstrating four to six orders of magnitude reduction in size with fewer than 60 logical qubits. These quantum advantages are enabled by quantum oracle sketching, an algorithm for accessing the classical world in quantum superposition using only random classical data samples. > >Combined with classical shadows, our algorithm circumvents the data loading and readout bottleneck to construct succinct classical models from massive classical data, a task provably impossible for any classical machine that is not exponentially larger than the quantum machine. These quantum advantages persist even when classical machines are granted unlimited time or if BPP=BQP, and rely only on the correctness of quantum mechanics. > >Together, our results establish machine learning on classical data as a broad and natural domain of quantum advantage and a fundamental test of quantum mechanics at the complexity frontier.


####Layman's Explanation:

This paper claims an end-to-end exponential quantum memory advantage on useful classical-data tasks, not just contrived oracle problems.

The central idea is quantum oracle sketching: a small fault-tolerant quantum computer does not store the full dataset and does not rely on QRAM. Instead, it processes ordinary classical samples one at a time, applies incremental coherent updates, discards the samples, and builds the quantum query access needed to run quantum linear-algebra-style routines on massive data streams. The readout side is handled with interferometric classical shadows, so the output is a compact classical model rather than an unreadable quantum state.

The paper’s theoretical claim is that this gives a small quantum machine enough leverage to solve three broad classes of tasks on massive classical data: linear systems, binary classification, and dimension reduction. For the static versions of those tasks, they claim a quantum computer of poly(log N) or poly(log D) size can succeed with about O(N) samples, while any classical machine matching the same performance needs exponentially larger memory. For the dynamic versions, where the observed data distribution changes over time but the underlying target structure stays roughly fixed, they claim sub-exponentially smaller classical machines would need superpolynomially more samples to keep up.


######Link to the Paper: https://arxiv.org/pdf/2604.07639


######Link to the Official Blogpost: https://quantumfrontiers.com/2026/04/09/unleashing-the-advantage-of-quantum-ai/

u/44th--Hokage — 1 day ago
🔥 Hot ▲ 173 r/accelerate

Ray Kurzweil On Why 2032 Could Be The Year Humans Stop Aging

##A Short Explanation of Kurzweil's Position On Technologically-Enabled Longevity:

He calls it "longevity escape velocity," (or as everyone here calls it: LEV) where breakthroughs add more years to your life than time takes away.

"By around 2032, people who are diligent with their health are going to reach what we call longevity escape velocity. This is when scientific breakthroughs will add more time to our remaining life expectancy than is going by. So we could be going backwards in time as far as our health is concerned."

In other words, aging stops being a one-way street.

The engine behind this shift isn't just better medicine. It's AI doing what humans never could, testing billions of possible treatments at once.

"We'll soon have the ability to rapidly test billions of possible molecular sequences to find cures ultimately for all diseases."

Centuries of medical research, compressed into years. That's the scale of change he's describing.

And this future doesn't replace humanity.

According to Kurzweil, it extends it.

"As we emerge with AI in this way, we will become a hybrid species. We will still be human but will be enhanced by AI."

But perhaps the most grounding part of his argument isn't scientific at all. It's personal.

"I want to live indefinitely because I want to see my loved ones and I want to continue working on my creative projects. I don't see a time when I would not feel that way."

At its core, his motivation is simple: he just wants to keep showing up for the people and work that matter most to him.

u/44th--Hokage — 1 day ago

MIT Presents "Exponential Quantum Advantage In Processing Massive Classical Data": Small Quantum Computers Beat Exponentially Larger Classical Machines

####TL;DR:

Our results provide strong evidence that the quantum method gets strong performance with fewer than 60 logical qubits and shows four to six orders of magnitude smaller machine size than the classical and QRAM-style baselines on the main real-world datasets. Rather than fearing that classical AI will “eat quantum computing’s lunch,” we now have rigorous evidence pointing towards a much more exciting prospect: quantum-enhanced AI overpowering classical AI.


####Abstract:

>Broadly applicable quantum advantage, particularly in classical data processing and machine learning, has been a fundamental open problem. In this work, we prove that a small quantum computer of polylogarithmic size can perform large-scale classification and dimension reduction on massive classical data by processing samples on the fly, whereas any classical machine achieving the same prediction performance requires exponentially larger size. Furthermore, classical machines that are exponentially larger yet below the required size need superpolynomially more samples and time. > >We validate these quantum advantages in real-world applications, including single-cell RNA sequencing and movie review sentiment analysis, demonstrating four to six orders of magnitude reduction in size with fewer than 60 logical qubits. These quantum advantages are enabled by quantum oracle sketching, an algorithm for accessing the classical world in quantum superposition using only random classical data samples. > >Combined with classical shadows, our algorithm circumvents the data loading and readout bottleneck to construct succinct classical models from massive classical data, a task provably impossible for any classical machine that is not exponentially larger than the quantum machine. These quantum advantages persist even when classical machines are granted unlimited time or if BPP=BQP, and rely only on the correctness of quantum mechanics. > >Together, our results establish machine learning on classical data as a broad and natural domain of quantum advantage and a fundamental test of quantum mechanics at the complexity frontier.


####Layman's Explanation:

This paper claims an end-to-end exponential quantum memory advantage on useful classical-data tasks, not just contrived oracle problems.

The central idea is quantum oracle sketching: a small fault-tolerant quantum computer does not store the full dataset and does not rely on QRAM. Instead, it processes ordinary classical samples one at a time, applies incremental coherent updates, discards the samples, and builds the quantum query access needed to run quantum linear-algebra-style routines on massive data streams. The readout side is handled with interferometric classical shadows, so the output is a compact classical model rather than an unreadable quantum state.

The paper’s theoretical claim is that this gives a small quantum machine enough leverage to solve three broad classes of tasks on massive classical data: linear systems, binary classification, and dimension reduction. For the static versions of those tasks, they claim a quantum computer of poly(log N) or poly(log D) size can succeed with about O(N) samples, while any classical machine matching the same performance needs exponentially larger memory. For the dynamic versions, where the observed data distribution changes over time but the underlying target structure stays roughly fixed, they claim sub-exponentially smaller classical machines would need superpolynomially more samples to keep up.


######Link to the Paper: https://arxiv.org/pdf/2604.07639


######Link to the Official Blogpost: https://quantumfrontiers.com/2026/04/09/unleashing-the-advantage-of-quantum-ai/

u/44th--Hokage — 1 day ago

MIT Presents "Exponential Quantum Advantage In Processing Massive Classical Data": Small Quantum Computers Beat Exponentially Larger Classical Machines

####TL;DR:

Our results provide strong evidence that the quantum method gets strong performance with fewer than 60 logical qubits and shows four to six orders of magnitude smaller machine size than the classical and QRAM-style baselines on the main real-world datasets. Rather than fearing that classical AI will “eat quantum computing’s lunch,” we now have rigorous evidence pointing towards a much more exciting prospect: quantum-enhanced AI overpowering classical AI.


####Abstract:

>Broadly applicable quantum advantage, particularly in classical data processing and machine learning, has been a fundamental open problem. In this work, we prove that a small quantum computer of polylogarithmic size can perform large-scale classification and dimension reduction on massive classical data by processing samples on the fly, whereas any classical machine achieving the same prediction performance requires exponentially larger size. Furthermore, classical machines that are exponentially larger yet below the required size need superpolynomially more samples and time. > >We validate these quantum advantages in real-world applications, including single-cell RNA sequencing and movie review sentiment analysis, demonstrating four to six orders of magnitude reduction in size with fewer than 60 logical qubits. These quantum advantages are enabled by quantum oracle sketching, an algorithm for accessing the classical world in quantum superposition using only random classical data samples. > >Combined with classical shadows, our algorithm circumvents the data loading and readout bottleneck to construct succinct classical models from massive classical data, a task provably impossible for any classical machine that is not exponentially larger than the quantum machine. These quantum advantages persist even when classical machines are granted unlimited time or if BPP=BQP, and rely only on the correctness of quantum mechanics. > >Together, our results establish machine learning on classical data as a broad and natural domain of quantum advantage and a fundamental test of quantum mechanics at the complexity frontier.


####Layman's Explanation:

This paper claims an end-to-end exponential quantum memory advantage on useful classical-data tasks, not just contrived oracle problems.

The central idea is quantum oracle sketching: a small fault-tolerant quantum computer does not store the full dataset and does not rely on QRAM. Instead, it processes ordinary classical samples one at a time, applies incremental coherent updates, discards the samples, and builds the quantum query access needed to run quantum linear-algebra-style routines on massive data streams. The readout side is handled with interferometric classical shadows, so the output is a compact classical model rather than an unreadable quantum state.

The paper’s theoretical claim is that this gives a small quantum machine enough leverage to solve three broad classes of tasks on massive classical data: linear systems, binary classification, and dimension reduction. For the static versions of those tasks, they claim a quantum computer of poly(log N) or poly(log D) size can succeed with about O(N) samples, while any classical machine matching the same performance needs exponentially larger memory. For the dynamic versions, where the observed data distribution changes over time but the underlying target structure stays roughly fixed, they claim sub-exponentially smaller classical machines would need superpolynomially more samples to keep up.


######Link to the Paper: https://arxiv.org/pdf/2604.07639


######Link to the Official Blogpost: https://quantumfrontiers.com/2026/04/09/unleashing-the-advantage-of-quantum-ai/

u/44th--Hokage — 1 day ago

OpenAI Chief Scientist Jakub Pachocki Sits Down To Discuss OpenAI's Research Roadmap To AGI Including A Research Intern-Level AI System By September 2026 & A Fully Automated AI Researcher By March 2028

##Synopsis:

>Jakub Pachocki, OpenAI's Chief Scientist, sits down with Jacob to cover the full arc of where AI research stands today and where it's headed. > >The conversation spans the explosive growth of coding agents and what it signals about near-term AI capability, the use of math and physics benchmarks as proxies for general intelligence, how reinforcement learning is being extended beyond easily-verified domains toward longer-horizon tasks, and what it means to run a research organization at the precise moment the models themselves are starting to accelerate the research. > >Jakub shares a candid take on the competitive landscape, why chain-of-thought monitoring is one of the most promising tools in the alignment toolkit, and — with unusual directness — why the concentration of power enabled by highly automated AI organizations is a societal problem that doesn't yet have an obvious solution.


##Link to the Full Interview:

######YouTube: https://www.youtube.com/watch?v=vK1qEF3a3WM


######Spotify: https://open.spotify.com/episode/7lir6GXl0FiQDg81B41Q0L


######Apple: https://podcasts.apple.com/fi/podcast/ep-84-openais-chief-scientist-on-continual-learning/id1672188924?i=1000760465610

u/44th--Hokage — 1 day ago
🔥 Hot ▲ 260 r/accelerate

Demis Hassabis Says The Brain Is Likely An Approximate Turing Machine. So Far, Neuroscience Has Not Found Quantum Effects In The Brain, Which Leaves Open The Possibility That AI Could Eventually Mimic Much More Of Human Cognition

Link to the Full Interview: https://www.youtube.com/watch?v=C0gErQtnNFE

u/44th--Hokage — 2 days ago

Claude Mythos: Highlights from 244-page Release | AI Explained

##Synopsis:

>The model, the Mythos, the legend. We have a new best AI model, but not all of us. How good is it, what does it’s new offensive capabilities mean? Why does it’s 244 page report card remind me of Her, and why did the creator of Claude Code call it ‘terrifying’. 30+ highlights sourced by reading the paper in full, old-school, no AI summary: > >https://80000hours.org/aiexplained

youtube.com
u/44th--Hokage — 2 days ago
🔥 Hot ▲ 69 r/accelerate

ByteDance Presents "In-Place TTT": A Drop-In Method For Turning Standard Transformer LLMs Into Dynamically Updating Models At Inference Time

####TL;DR: In-Place TTT is a drop-in method for turning standard Transformer LLMs into dynamically updating models at inference time, and the paper shows that this actually moves long-context benchmarks rather than just sounding elegant on paper.


####Abstract: >The static train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. > >In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a drop-in" enhancement for LLMs without costly retraining from scratch. > >Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. > >Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.


####Layman's Explanation: In-Place TTT is a way to give a normal Transformer LLM a form of online memory at inference time without replacing the architecture or retraining a totally different model. Instead of adding a separate recurrent memory module, it repurposes the MLP block’s final projection matrix as fast weights and updates those weights in-place, chunk by chunk, while keeping standard attention intact.

The key trick is that it does not train those fast weights to merely reconstruct the current token; it uses a next-token-prediction-aligned objective so the temporary memory is storing information that is actually useful for language modeling. The result is a drop-in TTT method that is compatible with context parallelism and designed to scale on modern hardware.

#####Results: As a drop-in upgrade on Qwen3-4B, it improves RULER long-context performance from 74.3 to 78.7 at 64k, 74.8 to 77.0 at 128k, and 41.7 to 43.9 at 256k extrapolation. The paper also shows the same idea transfers to other bases, improving LLaMA-3.1-8B from 81.6 to 83.7 at 64k and Qwen3-14B from 67.9 to 70.6 at 64k.

When trained from scratch, it beats prior TTT-style and efficient-attention baselines on sliding-window perplexity at 500M and 1.5B, and at 4B it delivers large long-context gains like RULER-16k: 6.58 → 19.99 for full-attention transformers and RULER-8k: 9.91 → 26.80 for sliding-window transformers. The paper’s efficiency plots also claim the added throughput and memory cost is small enough to be practical.


######Link to the Paper: https://arxiv.org/pdf/2604.06169


######Link to the GitHub: https://github.com/ByteDance-Seed/In-Place-TTT

u/44th--Hokage — 2 days ago