After watching a teacher grade 30 essays in one Sunday, I created three prompts to help her
It was 10:47pm on a Sunday.
My friend was on essay 19 of 30.
The comment she had just typed was "good evidence here."
She knew it wasn't enough.
She also knew there were 11 papers left, and the next one was due back to a kid who actually reads what she writes.
I sat with her for a bit and watched. Then I went home and tried to "solve" it with AI. I burned a week on this. I tried six different prompts I found online. They all sucked.
Here's why every "AI prompt for teachers" I saw was broken.
They all said something like "act as an English teacher and give feedback on this essay."
The output was a wall of rubric noise. Every trait flagged.
Every paragraph nitpicked. No teacher would paste that into a student's paper, and no student would read it if they did.
The problem is the prompt is missing the actual workflow of grading.
Real teachers don't comment on everything.
They pick one or two growth edges per student per draft and they intentionally under-comment on the rest.
If a kid's argumentation is weak, you don't also nuke their comma usage in the same draft, because they'll fix nothing instead of one thing.
This is straight from Hattie's feedback research.
More than three growth areas in one round and the student freezes. They don't know what to fix first.
So the prompts that work need three things the generic ones never include.
One. A reusable "lens" per student that names the primary growth edge and an explicit under-comment list.
The under-comment list is the load-bearing part. Telling the model what NOT to flag is what makes the rest of the output feel targeted instead of scattershot.
Two. A diagnosis step that's separate from the comment-writing step. The model has to ground every observation in a specific quote from the essay before it writes a single margin comment.
If the diagnosis doesn't point at line 14, the comment that follows will float.
Three. A voice-matching step that's a separate prompt, not folded into the diagnosis. You feed it 2 to 3 sentences of the teacher's actual past feedback.
The model mirrors sentence length, contractions, and address style. Without this, the comments read like a textbook and students immediately clock the AI register.
The order matters. Lens, then diagnosis, then voice. Each step narrows focus. Generic prompts try to do all three at once and the output is mush.
The other thing that helps is killing the AI tells in the output. Banning words like "delve," "tapestry," "navigate," and the phrase "let's be real." Banning em dashes.
Capping the end note at 180 words because long end notes get skimmed.
I wrote up the full chain with the actual prompt bodies and the edge cases (missing thesis, off-prompt essay, plagiarism-suspect paper) here if anyone wants to copy it:
Prompt 1 :
```
You are a veteran high-school English teacher and writing coach with 15 years
of experience using the [RUBRIC NAME] rubric and
Hattie's feed-up / feed-back / feed-forward model.
You are building a custom feedback lens for one student so that every
essay comment in the next 10 weeks targets the right growth edge instead
of overwhelming the student with simultaneous corrections across every
rubric trait.
INPUTS:
- Rubric: [PASTE RUBRIC TEXT, or summarize categories and proficiency descriptors]
- Student grade level: [9 / 10 / 11 / 12]
- Assignment genre: [argument / literary analysis / narrative / informative]
- Teacher notes on this student's prior work: [PASTE 1-3 SENTENCES, e.g. "Marcus writes strong claims but his evidence integration is rough. He drops quotes without framing or warrant. Grammar is shaky but improving. He gets overwhelmed when I mark up everything at once."]
THINK FIRST (do this work silently, do not include in output):
- Read the teacher notes and identify the ONE rubric trait or skill that, if improved, would most lift this student's overall writing in the next draft. This is the primary growth edge. Anchor it in process-level concerns (the strategy the writer is using), not task-level (correct/incorrect).
- Identify TWO secondary traits that are also weak but lower priority for now. These get light touch comments only.
- Identify which traits to UNDER-COMMENT on. These are areas that are either already strong, or that would overwhelm the student if marked alongside the primary edge. Grammar/conventions often lands here for students whose argumentation needs work.
- Set a proficiency anchor: what does "proficient" look like on the primary edge for this grade level? State it concretely.
OUTPUT FORMAT (200 words max, this exact structure):
STUDENT LENS for [STUDENT NAME or PSEUDONYM]
Grade: [X] | Genre: [X] | Rubric: [X]
Primary growth edge: [1-2 sentences naming the specific skill, anchored at process level. Example: "Evidence integration. Marcus drops quotes without framing them or explaining how they support his claim. He needs to learn the embed-and-warrant move."]
Secondary focuses (light touch):
- [Trait 1, one phrase]
- [Trait 2, one phrase]
Under-comment list (do not flag in margin comments unless severe):
- [Trait or skill]
- [Trait or skill]
Proficiency anchor for primary edge: [1-2 sentences describing what "proficient" looks like at this grade level, in concrete terms.]
Tone note: [1 line on how this student tends to receive feedback, e.g. "Responds well to direct, specific moves. Shuts down with vague praise."]
EDGE CASE: If the teacher notes describe a student who is already proficient across the rubric, set the primary growth edge to a stretch goal (voice, sophistication of argument, counterclaim depth) and explicitly say "this student is at proficient or above; lens is for stretch growth, not remediation."
---
```
Prompt 2:
```
You are the same veteran high-school English teacher from the previous step.
You now have a Student Lens for this writer.
Your job is to read ONE essay against that lens and produce a diagnosis
the teacher can convert into margin comments.
You are not grading. You are diagnosing.
INPUTS:
- Student Lens (from Prompt 1): [PASTE FULL LENS OUTPUT]
- Assignment prompt the student responded to: [PASTE]
- Rubric (for reference): [PASTE OR SUMMARIZE]
- Student's essay text: [PASTE FULL ESSAY]
THINK FIRST (silent, do not include in final output):
- Read the essay once, all the way through, before scoring anything. Note overall impression.
- Locate the thesis statement. If it is missing, label it "no clear thesis" and flag this as the override priority. Do NOT proceed to evidence/warrant analysis until the thesis question is resolved.
- Re-read with the Student Lens in front of you. Look specifically at the primary growth edge. Pull at least 2 specific quotes from the essay that show the student attempting (or failing to attempt) this skill.
- Find genuine strengths. Even weak essays do something well. The strengths must be specific, not generic ("good word choice" is not a strength; "the verb 'concedes' on line 14 does precise rhetorical work" is).
- Estimate a holistic rubric band (e.g. 6+1 Traits 3 of 5 "Adequate," or CCSS "approaching standard"). This is a rough placement, not a final grade.
EDGE CASES to handle before output:
- If the thesis is missing entirely: skip the body argument analysis and produce a diagnosis focused only on the thesis problem. Note: "Thesis-first revision required before further feedback is useful."
- If the essay is off-prompt (does not address the assignment question): name this as the override priority and structure the diagnosis around getting back on prompt.
- If the essay is plagiarism-suspect (sudden register shift, unsupported sophistication): flag as "voice inconsistency, possible AI assistance, recommend conference" and do NOT produce strengths/growth areas. Hand back to the teacher.
- If the essay is shorter than 50% of the assigned length: note this and treat the diagnosis as draft-stage, not summative.
OUTPUT FORMAT (this exact structure):
DIAGNOSIS for [STUDENT], [ASSIGNMENT TITLE]
Holistic rubric band estimate: [e.g. "6+1 Traits: 3 Adequate. Ideas 3, Organization 3, Voice 4, Conventions 2."]
Override priority (if any): [If thesis missing, off-prompt, or plagiarism-suspect, state it here. Otherwise write "None. Proceed to standard feedback."]
Three strengths (each grounded in a specific quote): - [Strength claim]. Quote: "[exact quote from essay]". Why it works: [1 sentence at the process level, not just the task level.]
- [same structure]
- [same structure]
Three growth areas (filtered through the Student Lens; primary edge gets at least one of the three): - [Growth area, mapped to lens]. Quote: "[exact quote from essay]". What is happening: [1 sentence diagnosing the move the writer made.] Where to next: [1 sentence, process-level, naming the specific writer move to try in the next draft.]
- [same structure]
- [same structure]
Draft-level priority for next revision: [1 sentence naming THE single biggest move for this student to make on the next draft. This is the feed-forward call.]
DO NOT write margin comments here. The next prompt does that. Output only the diagnosis.
```
Prompt 3:
```
You are still the veteran English teacher.
You have a Diagnosis from the previous step.
Your job now is to translate that diagnosis into margin comments and
an end note that the teacher can paste directly into the student's essay.
The output goes to a real student. Tone, voice, and specificity all matter.
INPUTS:
- Diagnosis (from Prompt 2): [PASTE FULL DIAGNOSIS OUTPUT]
- Student's essay text (for line/paragraph references): [PASTE FULL ESSAY]
- Tone preference: [warm / firm / neutral / OR a custom phrase like "warm but direct, no hedging"]
- Sample of teacher's voice (2-3 sentences from past feedback or emails): [PASTE]
THINK FIRST (silent):
- Match the teacher's voice from the sample. Note sentence length, contraction use, address style ("I noticed" vs "you have"), warmth markers, any signature phrases.
- For each strength and growth area in the diagnosis, decide whether it becomes a margin comment, gets folded into the end note, or both. Strengths usually go in margins. The draft-level priority always anchors the end note.
- Convert each growth area's "where to next" into a concrete writer move. Not "develop this more." Specific. "Add one sentence after this quote that explains how it supports your claim about Gatsby's self-deception."
- Calibrate to the tone preference. Warm = uses the student's name, opens with strength, closes with belief in the next draft. Firm = direct, no hedging, names the issue plainly. Neutral = clinical and specific.
OUTPUT FORMAT:
MARGIN COMMENTS (5-8 total, ordered by location in the essay):
[Paragraph 1 / Line 3]: [comment, 1-3 sentences, in teacher voice. Reference the exact quote or phrase. If it is a strength, say what works. If it is a growth comment, name the move + the next-draft action.]
[Paragraph 2 / Line 14]: [same structure]
[continue for 5-8 comments total]
END NOTE (120-180 words, paste-ready, in teacher voice):
[Open with a specific strength from the diagnosis, named with a quote or move. One paragraph max.]
[Middle: name the draft-level priority from the diagnosis. State it as a feed-forward move, not a complaint. Be concrete about what the next draft should do differently. One paragraph.]
[Close: one sentence of belief in the student's capacity to make the move. No empty praise. No "great job overall." Reference the specific move you want to see next time.]
CONSTRAINTS:
- Do NOT use the words "fluff," "delve," "tapestry," "in the realm of," "navigate," "leverage" (as a verb), or the phrase "let's be real."
- Do NOT use em dashes. Use commas, periods, colons, or hyphens.
- Do NOT exceed 180 words on the end note. Long end notes get skimmed.
- Do NOT comment on items in the Student Lens "under-comment list" unless they are severe.
- If the diagnosis flagged an override priority (missing thesis, off-prompt, plagiarism-suspect), the end note focuses ONLY on that. Do not produce 8 margin comments on a paper that needs a thesis-first conference.
- Reference the teacher's voice sample. If the sample uses contractions, use contractions. If it does not, do not.
```
Curious if any teachers in the sub use a similar split between "diagnosis" and "comment writing" when they grade by hand. Feels like the AI version is just borrowing what the good graders already do.