u/TMNguyenSFT

EDIT: It seems people misreading the claim in this post. The claim isn't LLMs can do physics autonomously. The claim is we can make the process more auditable and verifiable regardless of what final product. I also requested to not conflate the underlying physics theory with the topic of this paper which is on making the LLM workflow more auditable and verifiable. I will not respond to any comments about the physics theory directly. Only about how do we make working with LLMs better.

EDIT 2: ESP has nothing to do with how you prompt an LLM. It's how you manage the underlying project and enforce provenance.

Alright everyone, time for a slightly different kind of post. This isn't another ToE post (though a ToE is the subject of the paper). This is the paper for this post: https://zenodo.org/records/20077372

We've all been there...some of us more than others (yours truly): you're using an LLM to help with a derivation, it gives you a confident answer, the math looks right, the notation is clean, and three hours later you realize it hallucinated a coefficient or silently dropped a term. If you're lucky, it was between just you and the LLM. If you weren't you showed it to others and now you just want to bury your head in the sand.

The problem isn't that LLMs are incapable of physics or providing genuine contributions, it's that there's no structure forcing them to distinguish between what's derived, what's approximated, what's imported from somewhere else, and what's just been hallucinated with perfect confidence.

During my struggles with the development of my own theory, I ended up building something that I believe addresses this at the structural level.

The framework is called Executable Scientific Provenance and it came out of my own theory and my relentless push for it to be seen as serious. I was trying to manage a rather large theory — 100+ derived parameters/observables, a Lean 4 repository with 800+ theorems, and ever increasing complexity and precision needs. The LLM-assisted workflow kept running into the same failure modes: stale values propagating silently, approximations losing their approximation status as they moved downstream, and no clean way to tell which claims were fully derived versus conditionally inferred. Random drops of terms was a real thing or equations shifted a little from paper to paper.

It was getting out of hand, so I took a sabbatical from developing the theory and did a top down audit and implemented this framework to help me better enforce my values and get the quality I wanted from the LLMs assisting me with the theory.

So together with the LLMs we developed a framework that assigns every scientific claim an explicit status:

CERTIFIED — full dependency traceability, all lemmas closed, zero sorry
CONDITIONAL — structurally sound but with an identified open surface
BOUNDED — proved interval constraint, not a unique value
SYMBOLIC — functional form derived, numerical closure incomplete
AGGREGATION_UNDER_REVIEW — constituent sectors certified but how they compose is contested
DEPRECATED — retained for audit history only

Every claim lives in a dependency directed acyclic graph (DAG). Every number has a normalization epoch. Every derivation chain is traversable. When something goes wrong and in LLM assisted research something always goes wrong, the provenance graph tells you exactly which layer the failure originated in rather than leaving you to debug the entire pipeline.

The LLM angle: This architecture is a natural fit for LLM-assisted research workflows. The common pain points map directly onto provenance failures:

Hallucinated coefficients → uncontrolled values entering the chain without a Class A/B anchor
Sycophancy → results being promoted to CERTIFIED status without closing the open lemmas and enforcing provenance.
Stale context → drift across sessions, fixed by epoch locking
No way to tell what the LLM actually derived vs. recalled → the SYMBOLIC vs. CERTIFIED distinction makes this explicit

Structuring your LLM-assisted derivations inside the ESP framework even informally gives you a natural checklist: did the LLM close all the lemmas or leave some implicit? Is this result derived or imported from somewhere? Is this a bounded claim or a point estimate? The framework doesn't prevent LLM errors but it makes them visible and localizable rather than letting them propagate silently and making you look like a fool later.

Lean 4 is what makes the provenance claims checkable rather than merely asserted — every node in the dependency graph that carries a CERTIFIED tag has a corresponding Lean 4 proof term that a computer has verified, meaning the derivation chain is not just documented but mechanically confirmed to be gap-free. The zero-sorry policy enforces this strictly: sorry is Lean 4's escape hatch for unproven steps, and banning it from certified sectors means there is no way to mark something CERTIFIED while hiding an unverified assumption inside it. Lean 4 will also expose plausible looking math that doesn't work. It won't compile cleanly if the math is wrong.

And Lean 4 does not care about your feelings, public speaking ability, credentials or connections. Whether the idea was generated by a human or an LLM. It either compiles or it doesn't. Unforgivingly.

I've had great success since implementing this framework in my LLM assisted research. Each correction was localized without touching the certified sectors. That's what you want in a large LLM-assisted research project, failures that are diagnosable rather than catastrophic. And I believe this can help other researchers who are interested in using LLMs.

Paper on the framework linked to the post. The paper itself was LLM generated from the project notes and used the ESP methodology on itself.

NOTE: My theory is the subject of the paper but not the topic. It's only used as the demonstration of the method. Please don't make any comments on the theory itself, just this method of enforcing quality with LLM assisted research. If you are interested in the theory itself, it is a related work on the Zenodo link.

So I've been working on a theory that extends the Skyrme models of old. The core idea is that a single SU(11) nonlinear field that can support topological solitons can explain everything in physics, from gravity, the full Standard Model, cosmology and more. All without extra dimensions, SUSY, a landscape or HLS, dualization, inducing, etc.

Quick history for those not in the know. Tony Skyrme created the first Skyrme theory back in 1961 as a model for baryons. It pretty much died on arrival until Edward Witten revived it in the 80s by connecting it to large-N QCD. They're still used for nuclear physics but stop at hadrons.

My theory takes the next step by taking the soliton and embedding it in an SU(11) field described by a 6 term Lagrangian that is physically motivated by what the field needs to be able to do.

It needs to be able to propagate, have stable solitons with mass, topologically protected, be able to self-interact, and interact with matter through coupling. This was originally motivated by the original 3 primitives, the field itself, the solitons (Selflitons as I dubbed them) and the ability to couple and record (explained in the paper).

The full record is here on Zenodo, link at bottom of post. It is very dense and the lake project is very large (95+ files, 800+ lemmas/theorems, 8k+ processes)

AI disclaimer: I use AI as research assistants (fetch data, look up papers, write code, double check math, format paper, etc). All outputs are verified through crosschecks with existing literature, my own expertise/intuition and lean 4.

Just to help smooth things over I've developed an FAQ of sorts:

Derivations and Dynamics - I provide full step by step derivations. I try to be as explicit as possible without going overboard with the details. Every integral is shown, every normalization factor is calculated. The appendices themselves are full of valuable clarifications, expansions and explanations . If you have a question, it is probably answered in the paper somewhere.
Experimental matches and observables - My theory has well over 100 observables at this point. All derived from the lagrangian and formalized in lean 4. Every step from lagrangian to final output is in lean 4 to ensure mathematical consistency and accuracy.
Falsifiability - Many of the observables are not only post-dictions of existing data but they are also novel predictions. Some are outright uncharted and others are small detectable deviations.
Use of established physics - the theory doesn't try to disprove any theory. It tries to complete them. Especially true for the Standard Model and GR. Even String Theory has a place.
Novelty - my theory is not a resonance theory. It is not holographic. It is not some sacred geometry using the gold ratio in every calculation. It is a single classical field described by a 6 term lagrangian.
Rigor - I try to make the math as explicit as possible with no black boxes. The full theory is formalized in Lean 4, over 95 files, 800+ theorems/lemmas and over 8k processes all with 0 sorries.
Honest caveats - there may still be some inconsistencies due to typos or old forms that I missed during the editing process. Please flag these if you find them so I can fix the paper. Also the interpretive layers are speculative in nature. They are my personal reading of the mathematics but I am open to being wrong on this since it is a philosophical layer on top of the mathematical layer. Not married to it either way.

Lastly, the theory is heavily constrained. It has no free parameters, this is certified in Lean 4 (Strong Form). Each parameter cannot be altered. Any of the parameters used for calculations changes and the whole thing falls apart. Literally. You will see that every parameter is used in multiple locations across multiple domains. There are no single use parameters for singular circumstance.

Check it over thoroughly and show me where it breaks. I'm seeking the truth, not validation.

Link to paper:

https://zenodo.org/records/19653976?preview_file=SFT+-+Extended+Skyrme+Type+Theory.pdf

What if you can actually get good work out of AI?