
2 questions about the the usage-based billing preview: caching and long context
- Does the usage based preview incorporate caching in your opinion?
- Is context window being factored in?
We know long context and non-cached prompts can cost 10x as much as shorter and/or cached prompts ( ~ <272K and cached prompts) depending on the provider. We also know that GHCP generally summarizes your input - most always not to breach the 272k context lenght to take advantage of that pricing window.
Using GPT 5.4 as an example of something I've used heavily in April , assuming I was working on a the same code base for the entire month, I would be saving anywhere between 0 and 1000% on my ~ USD 650 estimated future monthly bill.
Am I being too optimistic here in assuming GH took liberties in assuming the average user makes sporadic prompts that dont leverage the provider caching mechanism as much?