how to save 80% on your claude bill with better context
been building web apps with claude lately and those token limits have honestly started hitting me too. i'm using claude 4.6 sonnet for a research tool, but feeding it raw web data was absolutely nuking my limits.
i'm putting together the stuff that actually worked for me to save tokens and keep the bill down:
switch to markdown first. stop sending raw html. use tools like firecrawl to strip out the nested divs and script junk so you only pay for the actual text.
don't let your prompt cache go cold. anthropic's prompt caching is a huge relief, but it only works if your data is consistent.
watch out for the 200k token "premium" jump. anthropic now charges nearly double for inputs over 200k tokens on the new opus/sonnet 4.6 models. keep your context under that limit to avoid the surcharge
strip the nav and footer. the website's "about us" and "careers" links in the footer are just burning your money every time you hit send.
use jina reader for quick hits. for simple single-page reads, jina is a great way to get a clean text version without the crawler bloat.
truncate your context. if a documentation page is 20k words, just take the first 5k. most of the "meat" is usually at the top anyway.
clean your data with unstructured.io. if you are dealing with messy pdfs alongside web data, this helps turn the chaos into a clean schema claude actually understands.
map before you crawl. don't scrape every subpage blindly. i use the map feature in firecrawl to find the specific documentation urls that actually matter for your prompt, if you use another tool, prefer doing this.
use haiku for the "trash" work. use claude 4.5 haiku to summarize or filter data before feeding it into the expensive models like opus.
use smart chunking. use llama-index to break your data into semantic chunks so you only retrieve the exact paragraph the ai needs for that specific prompt.
cap your "extended thinking" depth. for opus 4.6, set thinking: {type: "adaptive"} with effort: "low" or "medium". the old budget_tokens param is deprecated on 4.6. thinking tokens are billed at the output rate, so if you leave effort on high, claude thinks hard on every single reply including the simple ones and your bill will hurt.
set hard usage limits. set your spending tiers in the anthropic console so a buggy loop doesn't drain your bank account while you're asleep.
feel free to roast my setup or add better tips if you have thembeen building web apps with claude lately and those token limits have honestly started hitting me too. i'm using claude 4.6 sonnet for a research tool, but feeding it raw web data was absolutely nuking my limits.
