u/stingrayer

Hi,

We deployed a basic Chat Gpt-5.4 model to one of Azures smaller regions and began testing a few agentic tools. Last week we noticed increase in response times, like doubling from 20s avg to 40+s. Checking the models monitoring pane, we see spikes in time to first byte, and last byte. On Friday the time to first byte spiked by a factor of 7. Now We are trying to understand if these fluctuations are a result of our minor prompt adjustments or caused by Azure infrastructure.

Can anyone with experience comment on how consistent/reliable the model hosting service is? e.g. should we expect constantly changing response times from the model?

Thanks!

Consistency of Foundry Hosted LLMS?