
Hey everyone 👋
Was reading the deepseek v4 docs this morning and noticed they kept prefill support for chat completions. For anyone who has not used it, prefill lets you pass an assistant message with prefix=True and the model continues from your prefix instead of generating its own opener.
Their example is forcing the model into a python code block by passing "python\n" as the assistant prefix and setting stop=[""]. The model has no choice but to start with python code, no preamble, no "sure here is the code," just the function. That alone solves half the structured output problems I deal with on production agents.
The reason this matters more than it sounds, most of the major providers quietly dropped this capability over the last year. OpenAI never had it on chat completions. Anthropic had it, then made it harder to use. Google's gemini API has nothing equivalent. The pattern was clear, providers prefer you go through their structured output APIs which are easier for them to monetize and limit.
Prefill is the most reliable way I have found to constrain model behavior for agent loops where you need exact format compliance. JSON schemas help, function calling helps, but prefill is the only mechanism that just removes the generation-of-the-opening-token problem entirely.
Anyone else been working around the loss of prefill on other providers? Curious what the workaround patterns look like, beyond "ask the model nicely and hope it follows instructions."
More here: https://api-docs.deepseek.com/guides/chat_prefix_completion