u/Emotional-Cut2952

Does the usage based preview incorporate caching in your opinion?
Is context window being factored in?

We know long context and non-cached prompts can cost 10x as much as shorter and/or cached prompts ( ~ <272K and cached prompts) depending on the provider. We also know that GHCP generally summarizes your input - most always not to breach the 272k context lenght to take advantage of that pricing window.

Using GPT 5.4 as an example of something I've used heavily in April , assuming I was working on a the same code base for the entire month, I would be saving anywhere between 0 and 1000% on my ~ USD 650 estimated future monthly bill.

Am I being too optimistic here in assuming GH took liberties in assuming the average user makes sporadic prompts that dont leverage the provider caching mechanism as much?

https://preview.redd.it/08lm6icg6w0h1.png?width=1631&format=png&auto=webp&s=4ee58d9030709a55999c7b3d3e06e3d8f40ee708

The business decision to go straight to API pricing is alarming and very insensible business wise. GH Copilot is ready to throw away the retail consumer base in favour of cutting losses. The idea that select users like Theo who abused request based usage forcing the entire business to change its model - although it was inevitably going towards token based usage - and punishing the entire user base with new-api pricing is pittiful and will drive away most of the users that feel no incentive to continue to use GH Copilot.

A tiered pricing system could be implemented to incentivize reasonable users to continue to use github copilot while enjoying discounted api rates; extended use of the first tier pushes you to the second tier where you would be incurring near api rates. The tiered system can keep GH Copilot's costs predictable while retaining the consumer base until more affordable models and chips facilitate cheaper LLMs and agentic coding.

2 questions about the the usage-based billing preview: caching and long context

Tiered pricing instead of flat API pricing