u/5anez — reddlx

▲ 25 r/unsloth+2 crossposts

I wrote a paper on HoloKV: Using CDMA Phase-Shifting to achieve O(N/k) KV-Cache Compression. Looking for Triton/CUDA collaborators.

Hey everyone,

I’m a 22-year-old independent researcher, and I’ve been trying to tackle the "Memory Wall" for long-context LLMs. Standard methods either quantize precision (which hits a hard limit) or use token eviction (which degrades reasoning).

I just published an open research draft for a different geometric approach called HoloKV.

The concept: Instead of appending new memory slots, HoloKV multiplexes (stacks) k tokens into a single physical memory slot. It uses deterministic +1/-1 orthogonal phase keys (inspired by CDMA telecommunications) to separate the signals.

To make it work natively with modern architectures, I introduced:

Variance Normalization: A sqrt(k) penalty to prevent Softmax entropy collapse caused by superimposing vectors.
Strict Even-Boundary Rule: A constraint on phase-key generation that perfectly preserves the 2D rotary commutative math of RoPE (Llama/Qwen).
LoRA Denoising: Injecting Query/Value LoRA adapters via Knowledge Distillation to natively filter out the Gaussian background static.

The Ask:
I have successfully built the mathematical simulator in PyTorch to prove the orthogonal extraction and RoPE preservation work. However, I am a solo dev working on a GTX 1650. To actually realize the 75%+ physical VRAM savings, this needs a custom SRAM Active Accumulation Buffer written in OpenAI Triton or CUDA to prevent the "Read-Modify-Write" penalty.

I am open-sourcing the math and the paper. If there are any Triton/FlashAttention kernel engineers here who want to collaborate and help me build the hardware kernel, please reach out or open a PR!

**Paper & Code:**https://github.com/0sami0/HoloKV

github.com

u/5anez — 9 hours ago

▲ 45 r/SmallYoutubers+1 crossposts

i’ve been treating the youtube algorithm like a locked box for about six months now. most people talk about retention and watch time like they’re magic spells, but i kept noticing something weird in my analytics. videos with similar avgs would get crushed by the exact same upload while others just kept getting pushed for days. so i decided to stop guessing and actually model it out. i pulled impression data across dozens of uploads and plotted them against time using exponential decay functions. what came back was basically a half-life curve. every video has one, but nobody really talks about how ctr directly controls the decay rate.

when people click through, youtube sees that as a positive session signal and keeps feeding it. when they don’t, the algorithm treats it like a dead end and cuts impressions fast. i modeled this with I(t) = I₀·e^(-λt) where t is hours since upload, I₀ is day one impressions, and λ is your effective decay constant. the wild part is that λ isn’t random it’s directly tied to session termination signals. if you fit a log-linear regression to your own data like ln(I(t)/I₀) = -λt, you’ll actually see λ drop as CTR climbs. that’s why low ctr videos flatline while high ctr ones compound. i’ve been running this on my own channel by isolating browse vs search traffic and watching how the slope changes when i tweak thumbnails, titles, and even first thirty second hooks just to see how it shifts the curve.

look at the graph above. the red line drops to half its starting impressions in under four hours because low ctr triggers those termination signals early. the teal line? stays alive for over fifteen. that’s not luck, that’s math. you can actually calculate your exact half life with t₁/₂ = ln(2)/λ. youtube’s recommendation window basically closes around forty eight hours unless you’re keeping that decay rate low enough through consistent clicks and session continuation. i’ve been testing this by tracking when impressions hit fifty percent of day one across different upload days and it lines up perfectly with the model. some niches need a higher baseline ctr just to survive past day one. others can coast on medium numbers because their audience naturally stays in session longer.

the exploit isn’t really an exploit at all, it’s just understanding that clicks are fuel for the distribution engine and youtube measures exactly how fast you burn through your initial push. i’ve been running differential checks too, basically dI/dt = -λI(t), to see where the steepest drops happen relative to thumbnail impressions vs browse traffic. search distribution barely affects λ because it’s demand-driven, but browse relies entirely on that exponential decay curve staying flat enough to trigger secondary pushes. if you’re tracking your own analytics right now try calculating your half life by finding when impressions hit fifty percent of day one. plot a few videos on different days and watch how ctr shifts that line. it changes everything about how you plan uploads, structure hooks, and even pick posting times. i’m still running the numbers on how search vs browse distribution affects the decay constant but the baseline model is already showing me where my content actually dies and why.

u/5anez — 9 days ago

▲ 0 r/NewTubers

i’m a data scientist who spends his time looking at how youtube’s recommendation system actually behaves. over the last few months i ran an experiment tracking impression decay, retention curves, and distribution patterns across thousands of videos in different niches. what i found completely changed how i think about the platform.

everyone tells you to chase watch time or optimize for retention. but when i plotted the actual data, it became obvious that youtube isn’t maximizing a single metric. it’s balancing a hidden set of variables that shift depending on niche, upload window, and even what device the viewer is using. if you’re trying to picture how this works, imagine five different sliders labeled watch time, advertiser satisfaction, diversity, freshness, and user trust. youtube doesn’t keep them at fixed positions. it moves them around in real time based on who’s watching and when.

for example, gaming content during evening mobile sessions pushes the watch time slider to max while quietly dropping advertiser satisfaction and diversity. educational videos uploaded late at night lean heavily into user trust and freshness even if retention isn’t as tight. news channels sitting on desktop in the morning balance all five depending on how quickly viewers consume headlines versus actually watching through.

the real insight came from tracking how impressions decayed when i shifted content types across different times of day. once i stopped treating youtube like a single-metric game and started mapping which variable was dominant for each context, the distribution completely flipped. it’s not about making longer videos or perfecting hooks. it’s about aligning your pacing, thumbnail strategy, and even video length with whatever weight is currently driving impressions in your niche.

reddit.com

u/5anez — 11 days ago

▲ 9 r/SmallYoutubers

the chart i attached shows how those weights actually move in practice. you can see gaming leaning hard into watch time while dropping advertiser satisfaction and diversity, educational content prioritizing user trust and freshness even when retention isn’t as tight, and news sitting somewhere in the middle depending on morning desktop patterns.

u/5anez — 11 days ago