Recent research keeps landing on the same uncomfortable claim: AI makes people think less.

The MIT Your Brain on ChatGPT preprint, the Microsoft/CMU work on AI and critical thinking, and the broader cognitive offloading literature all point that way. Under the default workflow, I think they're probably right.

The MIT preprint is especially relevant because it looked directly at essay writing. Participants who used LLMs for essay tasks showed weaker brain connectivity, lower reported ownership of their essays, and more difficulty recalling or quoting their own work compared with participants who wrote without tools or used search engines. The paper is still a preprint, so I'd be careful treating it as settled science. But the pattern it describes is what teachers are seeing in classrooms right now: students producing writing without fully processing it.

The Microsoft/CMU study points the same way from a workplace angle. People tended to think less critically when they had high confidence in AI, and more critically when they had higher confidence in their own ability. The study also found that GenAI shifts critical thinking toward verification, response integration, and task stewardship. The risk isn't AI use, rather... it's AI use without task stewardship.

If the workflow is "prompt in, essay out, copy-paste, submit," then of course cognitive engagement drops. There's nothing mysterious about it. That's what offloading means.

In principle this is an indication of the workflow rather than the tool itself.

Read carefully: these studies don't actually argue against AI in education. They identify the conditions under which AI use degrades thinking. Ownership collapses when the student doesn't have to account for their own choices. Critical thinking collapses when confidence in AI is high and confidence in self is low. The cognitive work that survives, according to Microsoft, moves toward verification, integration, and oversight.

The framework I'll describe was developed independently, but it lines up with that mechanism almost point for point. Ownership is preserved because the artifact trail makes every step traceable to a student decision. Self-confidence is built because students watch their own arguments survive attack, or learn precisely where to repair them. And the cognitive work is deliberately concentrated where Microsoft says it has to go under AI conditions: judging which objections matter, verifying what the model produces, integrating it into a structure the student already chose.

So I'm not arguing against the research. I'm arguing that the research describes a failure mode, and that failure mode can be mitigated.

But it does require design. Most students are using AI as an output machine. The question I've been working on for the last 6 months is whether you can design a writing process where offloading is structurally impractical. Honor codes and detection software won't wont cut it anymore.

Can you move students from offloading to interfacing?

By offloading, I mean using AI to replace the thinking. By interfacing, I mean using AI as something the student has to respond to, question, revise against, and judge. In one case, the model becomes a ghostwriter. In the other, a sparring partner.

We piloted one version of that. Here's what happened.

The setup

My partner is an AP Literature teacher and I designed an AI-integrated argumentative essay framework: six scaffolded steps, each producing an artifact the student carries forward into the next step.

The defining mechanic is something I'd call structural anti-plagiarism.

AI can still be used at every step, but the accumulated artifact trail makes simple outsourcing awkward and cognitively taxing. The student has to keep returning to prior choices, explain them, revise them, defend them. Copy-paste stops being the shortcut. It effectively becomes the "long way around".

The core move is red-teaming. Students use AI as an adversarial lens to attack their own argument. Not to write it but to break it. Defending an argument against critique is much harder than generating one in a vacuum. A blank page lets students drift. A hostile critique forces them to decide what they actually believe, what they can defend, and what needs repair.

We ran the pilot across two cohorts: 26 students total. Anonymous pre/during/post survey, 5-point Likert scales.

What the data showed

The pre-survey baseline matched what you'd expect:

80.8% said the hardest part of writing an essay was "finding a strong argument." Not grammar, not word count.
73% rated themselves 4 or 5 on "I thought AI was smarter than me at forming arguments."
50% said they had previously thought using AI for schoolwork was basically cheating.
38.5% rated themselves 4 or 5 on "I thought AI could just write a good essay for me if I asked."

Standard incoming beliefs: AI is smarter than me, AI is for cheating, AI can write the essay if I let it.

The post-survey is where it gets interesting:

84.6% rated their argument-construction ability as somewhat or much stronger after the program.
96.2% rated themselves 4 or 5 on understanding the difference between AI as a tool versus replacement.
100% scored 3 or higher on "using AI made me think more, not less," with 65.4% scoring 4 or 5.
76.9% said they could now see why submitting AI work would not work.
73.1% rated the program overall as good, with another 11.5% rating it excellent.

The red-teaming step produced the cleanest signal. 84.6% said red-teaming made their argument stronger. 53.8% reported feeling more confident after having their argument attacked.

That last result surprised me. Getting your work picked apart usually reduces confidence. Here, it often raised confidence. The students either saw their arguments survive pressure or learned exactly where to repair them.

The broader shift was also interesting. 24 out of 26 students said the program changed how they would use AI in the future. This suggests the issue may not be that students are naturally lazy with AI. Rather they may simply have never been given a serious protocol for using it well.

Once the tool stops feeling like a black box, students can start seeing it as something more limited and more useful. It can be used as a system that can surface options, objections, and weak points, but still needs a human being to decide what matters.

What I think actually happened

I want to be cautious about claims here. I'm a self-taught practitioner, not a researcher with a PhD, and 26 students is a pilot, not a full study.

But the mechanism seems clear enough to discuss.

The framework doesn't reduce cognitive load. It redistributes it.

The extraneous load: the formatting, generating counterexamples, surfacing possible objections, finding directions to investigate, can be partially offloaded to AI.

The germane load stays with the student: choosing the argument, evaluating which counterargument actually threatens the thesis, deciding what evidence matters, revising weak points, defending the final structure.

That's the important difference.

The research on AI and cognitive offloading generally describes what happens when AI takes over the thinking layer. Our pilot tested something different: what happens when AI is allowed to support the surface layer, but the student is still forced to do the judgment layer.

When a student uses AI to attack their own essay, they're forced into a meta-position relative to their own reasoning. They have to evaluate and judge whether the AI's criticism is valid. Then they have to defend, revise, or discard.

That isn't offloading. That's coupling... the student's cognition and the AI's output operating as a loop, with the student still in the evaluative seat.

Another way to put it: the student isn't just learning how to prompt. They're learning how to structure a thinking process.

They have to break the essay into a spine and components. They have to define constraints before generating content. They have to decide what the AI is allowed to do and what it isn't. That important because it keeps agency with the student. The model can explore possibilities, but the student sets the boundaries and makes the final judgment.

That's where I think a lot of AI writing assignments go wrong. If the AI enters too early, before the student has defined the shape of the problem, the model can quietly steer the whole direction of the work. The student thinks they're choosing, but really they're selecting from a path the model already opened.

The framework tries to reverse that order.

First, the student defines the structure. Then AI enters inside that structure.

That doesn't eliminate the risk of AI steering the student, but it reduces it...significantly. The student has already made decisions the AI has to respond to.

The strongest unexpected side effect was epistemic hygiene.

Students started fact-checking AI output without being heavily pushed to. They started noticing how easy it is for AI to sound confident while being wrong. They started recognizing weak reasoning, including their own.

On the post-survey:

38.5% rated "AI can sound confident while being wrong" as a 4.
26.9% rated it as a 5.
61.5% said they noticed their own weak reasoning at a 4.
Another 23.1% rated that at a 5.

We trained argument construction. What appeared alongside it was something broader: students treating AI output as something to be tested rather than trusted, then turning that same scrutiny back onto their own thinking.

That's significant, because the real problem isn't only plagiarism. It's epistemic posture. Does the student treat AI as an answer machine or as something to interrogate? Does the workflow reward submission or judgment?

What I'm not claiming

I'm not claiming this generalizes from 26 students to every classroom.

I'm not claiming the studies showing AI reduces engagement are wrong. Under the conditions they tested, they're probably right.

I'm not claiming this framework is the only design that can produce these effects.

What I am claiming is that the cognitive impact of AI is downstream of how the task is structured.

"AI makes people think less" isn't really a statement about AI in isolation. It's a statement about a workflow.

Change the workflow, and the task structure changes the cognitive conditions being measured.

That's the part I think deserves more attention. The same model can be used to outsource an essay or stress-test an argument. The difference isn't the model. The difference is the instructional design around it.

This probably applies beyond English essays. In science, AI could be used to predict edge cases in an experimental design. In history, it can be used to test the internal consistency of a thesis from multiple perspectives. In literature, one could use AI to attack an interpretation before the student finalizes it. The pattern is the same: don't ask AI to produce the final answer. Use it to pressure the student's thinking before the final answer exists.

If anybody has any questions, wants to see more data or want to see the framework themselves send me a DM and I will get back to you as soon as possible.

Thank you for your time.

u/Echo_Tech_Labs