
Built a way to auto-tag every ChatGPT chat by topic!! Runs locally in the browser, no AI calls!!
Disclosure: I'm the developer of AI Toolbox, the browser extension this post describes. Posting because the "no native way to organize ChatGPT chats by topic" problem is widely felt and worth talking about. Link at the bottom per the sub's rules.
Some context first. I've been on ChatGPT daily for about two years. My chat history is somewhere around 600 conversations. Maybe a quarter of them have useful auto-generated titles. The rest are "Untitled chat" or some variation, and I have no idea what's actually in them without opening each one.
For a while I tried to organize this manually using folders, but every system broke down within a week because I had to open each chat to figure out where it belonged. The manual classification work was the bottleneck. So I built the categorization to run automatically.
Why doesn't ChatGPT categorize chats by topic natively?
Genuinely no idea. ChatGPT will auto-generate a title for each chat (often useless), but that's the entire extent of "what is this chat about?" metadata you get. There's no topic tag, no category, no way to filter your history by "show me only the coding conversations" or "show me only the research questions." If you want to find a specific old chat, it's a search across titles (most of which are noise) or scrolling forever.
What does the auto-tagging actually do?
After your first sync, every conversation in your account gets classified into one or more of five built-in categories: Coding, Writing, Research, Math & Science, Business. The sidebar grows a Smart Tags section with colored pills (one per category) showing how many chats fall into each. Click a pill, you get a filtered list of every chat with that tag. Click a chat in the list, it opens in ChatGPT.
Each chat can get up to 3 tags if it spans multiple topics. The math-and-coding chat where you asked ChatGPT to derive something then implement it shows up under both, which is the right answer.
How does the tagging actually work, technically?
This is the part I want to be clear about because it's the question I'd ask if I were reading this. The tagging runs locally in your browser. It does not send your chats to any external AI for classification. Zero outbound API calls for the categorization step.
The detection is pattern-based: each category has a list of keywords and signals it looks for in the title and first 10 messages of a conversation, scored against a threshold. Code fences, language names, and SQL patterns trigger Coding. Question patterns like "explain", "what is", "compare" trigger Research. Math operators and scientific terms trigger Math & Science. The patterns are tuned and yes, occasionally miscategorize - I see a few chats per 100 that I'd personally classify differently. The accuracy is good, not perfect.
The results are cached in the extension's local IndexedDB so the classification doesn't re-run on every sidebar open.
Can you make your own tags?
Yes. There's a Custom Tag Rules section where you define a tag name, a list of comma-separated keywords, and a color. The extension matches your keywords against conversation content and tags accordingly. I made one for "Client work" with the client name as the keyword. Works exactly the same way as the built-in tags.
A few details from dogfooding
- Up to 3 tags per chat, with a score threshold. Early versions tagged anything that matched any pattern, and I ended up with chats tagged Coding + Writing + Business when they were really just business writing about a coding tool. Added a score threshold (matches have to score above a minimum to count) and capped at 3 tags. Both fixed the over-tagging problem.
- Recomputation runs after every sync, not on demand. I tried "recompute on every sidebar open" first and the sidebar got laggy when the history was big. Moved it to a post-sync background pass with a Promise lock so two syncs can't trigger overlapping computation. Sidebar stays snappy.
- Pattern scanning only covers the title and first 10 messages. First few messages are by far the strongest topic signal in a ChatGPT conversation. Scanning the full conversation buys ~1% accuracy and costs a lot more compute time. Stopped at 10 messages, accuracy stayed at the same number.
- Custom rules use word-boundary matching. Naïve substring matching meant my "client work" rule was matching chats about "clients" generally, which wasn't what I wanted. Switched to word-boundary regex and the false-positive rate dropped to roughly zero.
- Color choices are intentional. Blue for Coding because it's the most common dev-tool brand color. Green for Research because it reads as "information." Amber for Math because it stands out from the others. The colors aren't arbitrary; they're cues for fast visual parsing when you have a sidebar full of tag pills.
How does the workflow look?
Open ChatGPT. After the first sync (takes a minute on a long history), the Smart Tags section appears in the sidebar with the five colored pills and your counts. Click a pill to filter the chat list to that category. Click any chat to open it. To add a custom rule, open Custom Tag Rules, type a name, type keywords (comma-separated), pick a color, save. Tags recompute automatically.
For my 600-chat history, the first sync and classification was about 90 seconds. After that, new chats get tagged on the next sync automatically. The "where is that SQL query chat from a month ago" lookup that used to be 5 minutes of scrolling is now a 4-second click on the Coding pill.