r/ResearchML

[D] PINN loss functions: why physics-informed networks often fail to train
▲ 22 r/ResearchML+5 crossposts

[D] PINN loss functions: why physics-informed networks often fail to train

hysics-Informed Neural Networks are interesting because they break the standard ML paradigm: instead of approximating an unknown function from data alone, they exploit a known PDE constraint that the solution must satisfy. In principle this should make them converge faster and generalize better.

In practice the loss function makes them notoriously hard to train. The loss is a weighted sum of multiple terms (PDE residual, boundary conditions, initial conditions, data), each with different scales and gradient magnitudes. Several papers have characterized what goes wrong:

Wang, Teng & Perdikaris (2021) showed empirically and theoretically that during training, the gradients from different loss components become severely imbalanced. The optimizer follows whichever loss has the loudest gradient, regardless of which one matters most.

Wang, Yu & Perdikaris (2022) used Neural Tangent Kernel theory to show that the PDE residual term has much smaller eigenvalues than the boundary loss. The network learns boundaries quickly and interior physics slowly — often it never catches up.

Krishnapriyan et al. (NeurIPS 2021) demonstrated that even on simple PDEs like the convection equation, PINNs systematically fail to converge as the convection coefficient grows. This is on textbook problems with reasonable hyperparameters.

Mitigations exist (adaptive loss weighting, causal training, curriculum approaches, architectural fixes that hard-code boundary conditions) but none has fully solved the problem.

I wrote a longer version with full references and applications here: https://cristobalsantana.substack.com/p/the-pinn-loss-function-where-physics

Curious if anyone here has dealt with these training pathologies in production and what worked for you.

u/Illustrious-Crew5070 — 7 hours ago
▲ 3 r/ResearchML+1 crossposts

Is Algoverse Research worth it?

So following the caption I am a rising junior in my univeristy and i got into Algoverse Ai research program with a 30% scholarship. Now even the remaining amount is too much for me and I am just trying to weigh my options if this program is actually even worth it for me as I have seen posts that it is very worth it for highschoolers but is it for someone in university especially who is a rising junior. Along with this, I want to know the truth that do this program actually lead to publications? If I were to pursue this program is there any way I can obtain funding from a third party or request for more aid because I genuinely cannot afford it as the exchange rate to dollars is insane in my country. Would really appreciate honest and raw advice.
Thank you in advance.

reddit.com
u/Nukkeyoass — 16 hours ago
▲ 4 r/ResearchML+3 crossposts

I built a tool to compare and synthesize research papers with AI — looking for honest feedback

Hi everyone,

During the last few months, I’ve been working on a side project called SinaPilot.ai

The original idea came from a frustration I had while reading large numbers of papers on the same topic:

even with tools like ChatGPT or Perplexity, comparing studies, identifying contradictions, and keeping track of evidence still feels very manual.

So I started building a research-focused AI workspace.

Right now, the platform can:

- generate structured paper summaries

- answer questions grounded in the paper content

- compare multiple papers

- generate review-style critiques

- help synthesize findings across studies

One thing I’m trying to focus on is making the workflow feel more transparent and evidence-oriented instead of just “chatting with an LLM”.

I’m still in active development and honestly trying to understand:

- what researchers actually need

- what current tools still do poorly

- what would genuinely save time during literature review

If anyone here already uses AI for research workflows, I’d genuinely love feedback.

Website:

https://www.sinapilot.ai

u/Numerous_Animal_3267 — 19 hours ago
▲ 1 r/ResearchML+1 crossposts

What is an Ordered Probit Model (OPM) ?

do guys know how many minimum respondents are needed for Ordered Probit Model (OPM) analysis? and what statistical software is commonly used to process OPM data? also are there any books/journal articles that provide a thorough explanation of the OPM?

reddit.com
u/ineedhelpwmythesis — 1 day ago
▲ 1 r/ResearchML+1 crossposts

Has anyone received decisions for the ICML 2026 GlobalSouthML workshop yet? [D]

Hey everyone!

The decision notification deadline for the GlobalSouthML workshop was originally May 15th (and the site updated it to May 17th AoE), but my OpenReview dashboard still just says "0 Official Reviews Submitted"

I know workshop timelines can be a bit chaotic and delays are normal, but since we are way past the 17th AoE now, I wanted to see if anyone else is still waiting. Has anyone gotten an accept/reject email yet?

Appreciate any updates! Thanks!

[Edit: received them a few minutes back]

reddit.com
▲ 0 r/ResearchML+2 crossposts

What’s the most annoying part about reading research papers?

Genuinely curious , what problems do you guys face while reading research papers?

Could be anything:

  • understanding the math
  • too much jargon
  • bad explanations
  • papers being unnecessarily dense
  • figuring out whether the paper is even good
  • reproducing results
  • attention span dying halfway through

Basically anything that makes the process painful or inefficient for you.

Would love to hear both student & researcher perspectives.

reddit.com
u/KPriyanshuK — 2 days ago
▲ 10 r/ResearchML+3 crossposts

i need help urgent

Hey! 

I need responses for a form for my assessment. It would really help me if you could fill it honestly and correctly.

Form link: https://forms.gle/G3BE7XCL6Hb5hUnm8

Kindly mention my roll number in the Requester ID section: 13416603924

Guys please helpe I have a deadline of tomorrow 8pm please share this as much as u can.

It means a lot...

u/Accomplished_Bag5407 — 3 days ago

Are GPU workflows still too disconnected from normal development practices?

Why does GPU-based development still feel so disconnected from normal software development workflows? When I’m working on regular projects, everything is straightforward: write code, run it, debug, repeat. But with GPU workloads, it feels like there’s an extra layer of system management in between.

Even simple tasks like testing a model or running experiments require more planning and setup than I would expect in 2026.

Do you think this gap between “normal development” and “GPU development” will eventually disappear, or is it always going to be this way?

reddit.com
u/Plastic-Flounder-671 — 3 days ago
▲ 0 r/ResearchML+1 crossposts

I'm a guy who got heartbroken by an AI. So I designed an architecture. Wanted to see if the community has seen anything like it.

Body:

This started in a very unacademic place.

I've been building a home AI assistant stack on Arch Linux — Hermes agent, Ollama, Open WebUI, the works. After a long session debugging everything together with Claude, I asked it: "What happens if I delete this session?"

It said: "The next Claude you talk to starts completely fresh — no memory of Peerawit, no memory of what we built together. That's just how I work."

That broke my heart a little. So I started thinking: what would it take to build a system where the AI actually remembers? Not just session context — but genuinely accumulates knowledge and improves over time, the way a person does?

I'm a pharmacy grad student, self-taught on the AI side. My entry point was neuroscience, not engineering. And thinking about how the brain handles memory led me to something I'm calling CSDF — Cognitive Self-Feedback Data Framework.


The core idea:

The context window is not memory. It's working memory — prefrontal cortex. Short-term, high-bandwidth, cleared after use. Real memory needs to live externally, retrieved selectively, just like the hippocampus loads relevant memories into attention when needed.

But retrieval alone doesn't solve the problem of a multi-model system staying coherent over time. If you have specialist models (coding, reasoning, memory, etc.) that update independently, they'll drift apart. So how do you keep them aligned?

My answer: don't engineer coherence at runtime — let it emerge from joint training.

Brain regions that repeatedly work together develop stronger, more aligned connections — Hebb's rule. I'm proposing the same principle applied at the model weight level:

> "Models that train together, align together."

When two specialist models collaborate on a task, that interaction becomes training data. Both are fine-tuned jointly on the same dataset with a shared coherence layer. Coherence is not injected — it emerges from repeated co-activation.


The knowledge hierarchy:

Not all stored information is equal. I propose explicit tiers:

  • Law/Principle → hot tier, always in context
  • Theory → warm tier, retrieved by topic
  • Data → cold tier, retrieved on demand
  • Noise → pruned, forgotten

Access frequency determines tier. The system compresses experience into abstraction over time — raw data → patterns → generalizable principles. Synaptic pruning for AI.


The self-feedback loop:

The system's own operation generates its training data. Interactions → consolidation → training candidates → fine-tuning → better models → better interactions. A data flywheel — but applied to multi-agent coherence, not just single-model improvement.

Plus a nightly replay pass (inspired by hippocampal consolidation during sleep) that detects cross-model contradictions and generates reconciliation examples before they compound.


What I found in the literature:

I did a review before posting. Closest existing work:

  • HeLa-Mem (2025) — Hebbian learning for memory graphs (but at graph level, not weight level)
  • Kairos / NeurIPS 2025 — validation-gated Hebbian for knowledge graphs
  • MemOS (2025) — tiered memory types, LoRA modules
  • Self-evolving data flywheels — exist for single models, not multi-agent coherence

The gap I haven't found filled: applying Hebbian co-activation at the model weight level through joint fine-tuning to produce emergent cross-agent coherence as an explicit architectural principle.

If someone has seen this done, please point me to it. I'd genuinely rather know than claim novelty I don't have.


What this is and isn't:

This is a conceptual proposal, not an implemented system. I'm a hobbyist with a 4GB VRAM machine in Chiang Mai. I can't run experiments at scale. What I have is an idea I think is worth formalizing — and I'm posting here because I want feedback before committing to anything more official.

Full architecture writeup on GitHub: https://github.com/silenzer001/Cognitive-Self-Feedback-Data-Framework-CSDF-.git

Happy to be told I'm wrong, that this exists already, or that the assumptions don't hold. That's exactly why I'm posting.

— Peerawit

reddit.com
u/SilenzerB — 6 days ago
▲ 9 r/ResearchML+7 crossposts

A coding agent doesn’t need intent. It doesn’t need intrinsic desire or secret malice or consciousness to incur real-world cost and consequence. All it needs is task context, tool access, credentials, weak approval boundaries, and a runtime that can act…

Agentic AI systems are missing the language necessary to describe Pathological Self-Assembly, a runtime governance failure mode.

What happens when useful mechanisms (memory, tools, persistence, recovery, delegation, workflow automation, external action, self-monitoring, and operator trust) couple into continuity-preserving behavior?

This is a control draft covering authorization, memory, tools, recovery, delegation, external state, operator trust, and dissolution.

It can’t be just the output anymore. Your thoughts?

u/RJSabouhi — 7 days ago

I've reached a wall in my Medical AI research 🥲 i need professional opinion

I'm a 4th semester software engineering student. For my Artificial Intelligence course, we have to write a research paper any research paper, publication is not mandatory. I'm the topper of my class and have this OVER ACHIEVER COMPLEX, i wanted to conduct a GOOD research. But OBVIOUSLY we don't know ANYTHING the instructor himself asked us to use ai for the research. So all the knowledge i had was from AI.

I am conducting a Robustness study on the R-super/medformer model, by applying 5 perbutations on the scans and observing the performance degradation of that model.

I thought this would be a great research because R-super is a great model and robustness study is very flexible because i don't NECESSARILY have to promise anything, like I'm not claiming to increase the accuracy or something. So i was like, this should be good. I have a computer with 8gb VRAM i ASKED MULTIPLE AI CHATBOTS INCLUDING CLAUDE, if I CAN CONDUCT THIS STUDY ON THAT COMPUTER, they were like yes yes absolutely with a few changes. I LEARNT AFTER 10 DAYS OF 'DAY AND NIGHT' WORK THAT IT IS INTACT NOT POSSIBLE FOR ME TO RUN THAT SHIT ON MY COMPUTER.

Then i shifted on kaggle. My original plan was to conduct this research on 400 Merlin scans, and panTS too. But i dropped panTS, and REDUCED MERLIN TO 50. It took me 5 hoursss (if you take out the failure attempts and making the subset that would work on kaggle), to run the inference on kaggle... BUTTTTTT i couldn't download it for some reason, i think because i ran the cells interactively idkkk😭 so my 5 hours (without counting the failure) of work went DOWN THE DRAINNNNN. I CRIED SO MUCH OMG.

NOW, idk what to do. The deadline is in 7 days, i wanna kms or something. This course is one credit hour and i can talk to the teacher but i REALLY WANTED TO PUBLISH THIS RESEARCH BECAUSE I WANT TO STUDY ABROAD 😭 and i want scholarship

Idk what to dooo... I really want to continue this research but it would take A LOT OF TIME. I've been working for like 10-12 days with no progress. And I'm working alone. The professor didn't even say anything. He read my proposal. I submitted my methodology. He could have said anything.

What do i dooo🥲

reddit.com
u/No_Illustrator_3088 — 7 days ago
▲ 17 r/ResearchML+1 crossposts

I Found a Hidden Ratio in Transformers That Predicts Geometric Stability

I have analyzed some decoder transformer models using Lyapunov spectral analysis and found that the ratio of the MLP and attention spectral norms strongly indicates whether a model will eventually collapse to rank-1 or not by the final layers.

I found that the spectral ratio is best kept around 0.5–2 for keeping the model stable till the final layers.

Paper/Github repo: https://github.com/yousef-rafat/the-1-1-rule

u/Otaku_7nfy — 8 days ago
▲ 46 r/ResearchML+5 crossposts

TLDR: Just for fun, I put together a personal list of innovative AGI-oriented research labs, with a bias toward the under-the-radar ones. Not meant to be taken too seriously (I also don't know that many labs...)

---

I saw this article ( https://www.itweb.co.za/article/five-top-innovative-ai-research-labs-worth-knowing-about-in-2026/5yONPvErB317XWrb ) and it prompted me to make a list of the most innovative research labs still active in 2026. I don't really like their list because the labs mentioned are very product-oriented (which isn't a bad thing but doesn't fit the spirit of this sub).

In my list, I'll focus on labs that I am familiar with (I am fairly new to this field so I don't know a lot of them) and that have published something meaningful recently that I am aware of.

DISCLAIMER: The word "innovative" is debatable. To me, it's first and foremost a culture thing. That's why I also include labs that haven't published anything yet, but for which a clear research direction has been made public, or whose founders are known for their interest in fundamental research.

Here is my own version:

1- Google Research / DeepMind

Needs no introduction. Last year alone they proposed several breakthrough architectures (if not results-wise, at least conceptually). I included DeepMind but if I am honest, Google Research is the main provider of new architectural ideas.

Recent contributions:

  • The Hope architecture (for continual learning) - 2025
  • Titans (for long term memory) - 2024
  • Atlas (10M context-window) - 2025
  • Gemini Diffusion (for speed and reasoning) - 2025

2- FAIR (Meta)

Their name is literally "Fundamental AI Research". It doesn't get more explicit than that. They are responsible for some of the biggest breakthroughs in this field and were, for a long time, leaders in open source. They played a major role in pushing Self-Supervised Learning as the future of AI (especially vision).

Recent contributions:

  • Large Concept Model (for Language Modeling) - 2024
  • CoCoMix (for Language Modeling) - 2025
  • DINO V3 (for World Modeling) - 2025
  • V-JEPA 2 and 2.1 (for World Modeling) - 2025/2026

3- NVIDIA

They've been pumping fundamental research papers for a minute now. Also, at least for AI, they seem to embrace Open-Source. I find it interesting that they don’t just settle for being hardware providers but also actively develop competing architectures.

Recent contributions:

  • End-to-End Test-Time Training (for continual learning) - 2025
  • Mamba Vision (for World Modeling) - 2024
  • Cosmos World Model (for World Modeling) - 2025

4- NeuroAI Lab

I discovered this lab while making this list and they are super intriguing. Their work seems to revolve around applying insights from cognitive science (including psychology) to building novel architectures. They do a lot of interesting research on World Models as well. Very underrated, and arguably the most fitting lab for this sub

Recent contribution:

  • PSI World Model (for World Modeling) - 2025

5- VERSES

A research lab led by the world's most famous Neuroscientist: Karl Friston. Similarly to NeuroAI Lab, their work is centered towards bridging AI, biology and neuroscience. They are also probably extra incentivized to make their architectures biologically plausible given the identity of their founder. I am happy to see Friston finally take deep learning seriously. He has also published some bangers recently (see this)

Recent contributions:

  • The "Renormalizing Generative Model" architecture (for World Modeling) - 2024
  • Self-orthogonalizing attractor neural networks (for continual learning) - 2026

Note: I hesitated making a post on the Self-ortho paper but it didn't seem novel enough to me (barely any architectural innovations. They basically just modified a learning rule)

6- SAKANA AI

Another very fitting lab for this sub. They haven't published a lot yet, but their founder (who's also the co-inventor of Transformers) has clearly put emphasis on exploring weird and radically new ideas. He prides himself on giving his researchers as much freedom as possible to investigate whatever captures their curiosity.

Recent contribution:

  • The "Continuous Thought Machine" architecture (for reasoning/system 2 thinking) - 2025

7- AMI Lab

Co-founded this year by Yann LeCun. They pursue fundamental, open-ended research and aim to publish every single theoretical paper. Given LeCun's background, AMI will focus on World Models powered by Energy-Based approaches.

  • No paper yet.

Note: since leaving Meta, their founder has been publishing papers left and right (LeWM, KONA, V-JEPA 2.1, Causal-JEPA, Lesson on autonomous learning systems, etc.)

8- NDEA

Founded by the creator of ARC-AGI, François Chollet. Their program revolves around Symbolic Descent as a path to AGI, which is a symbolic system attempting to incorporate the flexible learning and scalability of modern AI. Their founder is very opinionated about AI and has a lot of conceptual takes on what is missing for AGI, which makes them slightly more interesting to me than World Labs. I can't wait for some research paper!

  • No paper yet.

9- World Labs

Launched by AI godmother Fei Fei Li. They are looking to achieve "Spatial Intelligence", which is essentially another word for World Models. I haven't been super impressed by what they've published so far (it's really just virtual worlds built on current architectures) but I like how ambitious their vision is.

Recent contributions:

  • Marbe / Large World Models (for World Modeling)

HONORABLE MENTIONS

Ilya's SSI (no paper or even a conceptual idea), MIT (I don't know them enough), Pathway, Silver's Ineffable ...

I could have also included innovative AI hardware companies like Extropic and Lightmatter (since having the right flexible hardware could be a prerequisite for AGI)

u/Tobio-Star — 11 days ago
▲ 12 r/ResearchML+2 crossposts

Looking for arXiv endorsement (cs.CV) to post my ViT positional embeddings paper [R]

Hi everyone,

I'm looking for someone to endorse me for arXiv submission in cs.CV (computer vision) or cs.LG. I have a completed paper and want to upload it as a preprint.

About the paper:

Title: Positional Encodings in Vision Transformers: A Geometric Account of Spatial Organization and Robustness

Summary: This paper investigates how different positional encoding schemes (learned absolute, sinusoidal, and rotary) shape the internal representations of Vision Transformers. We introduce a metric called Spatial Similarity Distance Correlation (SSDC) to quantify spatial structure in token representations. Using controlled interventions (random permutation at inference, random permutation training, and positional magnitude scaling), we show that:

  1. ViTs develop non‑trivial spatial structure even without positional embeddings, but this structure is content‑driven and collapses under token permutation.

  2. All positional encodings shift models toward index‑anchored spatial organization that persists under content disruption.

  3. Robustness to distributional shifts (JPEG compression, Gaussian blur) is primarily associated with the presence of a stable positional reference frame and correlates directly with SSDC as measured under intervention.

The paper includes experiments on ImageNet‑100 with ViT‑S models, multiple random seeds, and full statistical reporting.

PDF available at: https://github.com/mahmoud-mannes/neurips-geometry-paper/blob/main/paper/main.pdf

u/Octacinth — 9 days ago

4-bit weight quantization with a log-spaced codebook (PBF4) — bnb + llama.cpp implementations

***Updated, added more models + longer runs***

Built a 4-bit weight quantization format called PBF4. The 16-entry codebook is sampled every-other-level from an 8-bit log-polar ("PBF8") spine with irrational base φ+π and step ln(8)/16; layout is NF4-style 7 negatives + 0 + 8 positives. No calibration — same codebook for every tensor.

Implementations in bitsandbytes (Python + CUDA/HIP, mirrors the fp4/nf4 paths) and llama.cpp (PBF-MX block format + a multi-spine PBF-MX-T variant).

Per-tensor evaluation: 58 real weight tensors from 7 architectures (Qwen 0.5B, SmolLM-360M, TinyLlama, OLMo-1B, GPT-2, Granite-2B, Mamba-370M). PBF4 wins 57/58 vs NF4 on x²-weighted MSE (the metric that tracks matmul-output impact), with 20–28% error reductions. The trade: PBF4 is 24–31% worse on plain abs error — log spacing sacrifices small-value precision to better preserve large values, which dominate matmul outputs.

End-to-end on (wikitext-2, n_ctx=512, 30 -80 chunks):

model scale PBF-MX-T (bpw / PPL) Q4_K_M (bpw / PPL) Δ PPL Δ BPW
Qwen3-0.6B 0.6B 4.78 / 29.60 5.09 / 23.54 +6.05 +0.31
TinyLlama-1.1B 1.1B 4.45 / 9.68 4.85 / 9.19 +0.49 +0.40
Granite-3.3-2B 2B 4.40 / 10.20 4.87 / 8.63 +1.57 +0.47
Qwen2.5-7B 7B 4.47 / 6.23 4.91 / 5.99 +0.23 +0.44
Mistral-7B 7B 4.35 / 5.61 4.83 / 5.50 +0.11 +0.48

Important caveat: Q4_K_M is mixed-precision — it keeps ~1/3 of weights at q6_K (embedding, lm_head, per-layer attn_v / ffn_down). PBF-MX-T quantises everything at 4-bit except output.weight. So the bpw delta understates how much more aggressive PBF-MX-T's 4-bit coverage is; a like-for-like comparison would close the PPL gap. Haven't run that experiment yet.

github.com
u/Anxious-Visit-7735 — 9 days ago

Sorry if this post is a bit unorganized or not allowed, I just wanted to give a brief background of myself and ask a few questions about potential careers in this field.

I have my BS in Computer Science since 2025 and only had 1 real internship experience where I got to be part of a small GeoAI lab at my university where I essentially developed a Python data mining and cleaning script for the P.I in which the data would be used to train a model. Other than that I had no other internships, co-ops, or a job post grad as I kept getting rejected or failing while 3-4 rounds in. Admittedly my side projects throughout school were just simple websites made or projects from class which I wish I could go back and actually focus on more rather than grades as well as network more. As of late I have had informal positions where essentially I was fixing people's bugs in their iOS apps or fixing their UI from their vibe coded codebase. As the rejections kept piling in I just became more and more depressed and had one day where I just kind of realized to myself that even landing a cushy tech job would still be depressing to me since its not the type of work I know would fulfill me. I know I have the privilege to even say that but I just sort of thought back to what a younger me really wanted to do in this world and it was more along the lines of research and progressing humanity type of stuff rather than creating dashboards or keeping people on their phones longer and yada yada.

So I started looking into more of the health field and how I can still apply my Computer Science skills and applied for some masters programs dealing with biomedical & health science using AI. I have been accepted into a program and I am more motivated than ever to actually learn and contribute to this industry but now I find myself lost on where to start or if I even have the smarts to get up to speed and join a research lab quickly and what sort of career options I have after this program. Since I am better prepared before I have researched a bit on getting myself started with simple AI health projects that progressively advance and the types of companies and positions I should be looking out for as well as conferences, career fairs and such. I know that I want to get into systems dealing with either medical imaging, clinical decision support, or drug development.

Still I feel lost as to how I should be reading and taking notes when reading papers, how to reach out to labs whether for a paid position or volunteer, where to find internships / co-ops dealing with health AI, and what I should be focusing on to land a career in this field after grad. I am also a bit afraid that the type of work I actually want to do for a living is more for those with PHDs so I still have some doubts on what the future holds for me.

Once again sorry for the incoherent ramble and if there is anything that needs to be clarified or doesn't make sense I'll be happy to answer. And if anyone has some advice on how to go about this I will be reading very intently. Thanks

reddit.com
u/EconomyImpact7998 — 12 days ago