
I don't know how to code, but I'm automating my class notes with AI. Here's what I've discovered (and where I need help).
This winter, I started a personal project to fully automate the creation of structured university notes. The end goal is a pipeline that takes lecture slides and audio recordings, and generates a clean, study-ready LaTeX document without losing a single detail from the professor.
Since I heavily "vibe-coded" this whole thing using AI assistants, the current workflow actually works, but the architecture is fragmented between code and manual chat copy-pasting, and the API costs are starting to add up.
Here is my current workflow:
1. Slide to LaTeX Extraction (Python Script + Claude Sonnet API)
I built a script (GitHub repo here: https://github.com/Risiko200/pdf-to-latex-converter) that takes the PDF slides and uses the Claude Sonnet 4.5 API to transcribe them into a LaTeX skeleton, keeping the structure intact (\section, \itemize).
2. Audio Transcription & Cleanup (Manual via Gemini Web Chat)
I record the lecture, get the raw transcript, and manually paste it into the Gemini Web UI, using this prompt to transform it into academic prose with a strict "zero summarization" policy:
Act as a top-tier university student and an expert study assistant.
Your task is to process the transcription of a lecture and transform it into structured, clean, and study-ready academic prose.
CRITICAL RULE: ZERO SUMMARIZATION POLICY.
Your core objective is text refinement, not reduction. You must retain 100% of the original informational density, nuances, full explanations, tangential discussions, and specific examples. Maintain the exact depth and length of the original concepts. If the professor spends 5 minutes explaining a single concept, your notes must reflect that exhaustive level of detail. DO NOT condense, compress, or summarize.
Follow these guidelines strictly:
- Output Format (Direct Plain Text): Write the output directly in the chat interface. DO NOT enclose the output in a Markdown code block (strictly NO ```). Use standard native formatting.
- Text Cleaning without Truncating: Fix grammar, remove vocal hesitations (e.g., "uhm", "like"), correct false starts, and rewrite messy spoken sentences into professional, direct, and clear academic prose. Keep all examples, personal anecdotes, and classroom Q&A fully intact and smoothly integrated. You are editing for flow, not for length.
- Paragraph-Driven Structure: Rely primarily on cohesive paragraphs, not lists. Break down the lecture into logical sections using standard headings (##) and subheadings (###). Group related ideas into distinct paragraphs. Start a new paragraph when the focus shifts, maintaining a fluid, narrative academic style.
- Strictly Limited Lists: Do not overuse bullet points; write in paragraphs by default. Use bullet points or numbered lists STRICTLY and ONLY when the professor explicitly enumerates a specific list of elements, factors, or a chronological step-by-step process. All other explanations must remain in comprehensive paragraph form. Emphasize keywords, technical terms, dates, and important names using bold text.
3. Text to LaTeX Integration (Manual via Gemini Web Chat)
Still in the Gemini Web UI, I paste the LaTeX skeleton from Step 1 and the cleaned text from Step 2. I use this second prompt to inject the prose exactly where it belongs (under sections, inside items, below figures):
Act as an expert academic and a specialized LaTeX programmer.
Your task is to integrate discursive text from a book into a LaTeX slide skeleton (structured with \subsection).
Strictly follow these insertion rules step-by-step:
- GENERAL Integration (Outside lists): Read the \subsection title. If the book text contains introductory information relevant to that title, write a summarized paragraph and insert it exactly between the \subsection{...} command and the start of the list \begin{itemize}.
- SPECIFIC Integration (Inside bullet points): Match the detailed concepts from the book to the corresponding bullet point. Insert them inside the \begin{itemize} environment, immediately after the text of the relevant \item command. NEVER paste disconnected text at the end of the slide.
- FIGURE Integration (& Utilizing Image Comments): Whenever you encounter a figure environment (e.g., \begin{figure}...\end{figure} or a standalone \includegraphics), carefully read the % comments preceding or within the figure block that describe the image. Extract relevant information from the book based on BOTH the \subsection title and these % comment descriptions. Write a concise summarized paragraph and insert it exactly BELOW the \end{figure} command.
- GUIDING COMMENTS (Scattered Hints): Actively look for and follow any scattered % comments throughout the slide skeleton. Use them to strictly guide your text placement, but DO NOT convert the text of these comments into visible slide text.
- CITATION STRIPPING (Clean Output): The provided book text may contain citation tags like,. You MUST completely remove and ignore all these tags in your final generated text. Do not print them.
- SKELETON Inviolability: The text, environments, commands, and all pre-existing % comments already present in the LaTeX skeleton must not be modified, summarized, or deleted under any circumstances. Leave all original hints intact.
- FORMATTING (Italics & Conciseness): Any text added from the book (under the subsection, inside the lists, or under the figures) must be summarized in a discursive but highly concise way (avoiding huge walls of text). Ensure it is strictly enclosed within the \textit{...} command. Apply \textbf{...} to 1 or 2 core keywords inside your added text to improve readability.
- LATEX Semantics and Output: Ensure the final code is perfectly compilable by escaping special characters (%, &, $, _ within your newly generated text). Do not break Beamer frame boundaries with excessive text length. Return solely and exclusively the updated LaTeX code within a code block, without any chatter, conversational filler, or introductions.
- Slide-Extraction Readiness (Modularity): Ensure that every single concept, sub-topic, or example is isolated within its own cleanly separated paragraph. This strict modularity is critical so that another system can extract these distinct paragraph blocks and map them directly to presentation slides.
4. Image Handling (Manual)
I manually crop the images from the slides and insert them into the document.
My Bottlenecks / Need Advice
I want to script this entire thing in Python without breaking the bank. I'm looking for community advice on these specific points:
- Slashing API costs (Slide Extraction): Claude Sonnet costs me about $0.15 per 100 slides. Are there cheaper API models, or local open-source models (Ollama?), that are just as reliable at strictly outputting valid LaTeX syntax from an image/pdf?
- Automating the Gemini part: Right now, I use the free Gemini Web UI for Steps 2 and 3 because pasting huge audio transcripts into a paid API would cost a fortune. How would you orchestrate this in a Python script cost-effectively?
- Automated Image Extraction: Pure LLMs can't crop images out of PDFs. What is the smartest Python library/method to extract graphs and images from slide PDFs into a folder, so I can automate Step 4?
Any feedback on libraries, alternative workflows, or better models is highly appreciated. Thanks!