Create chat completion
This function processes chat completion requests by determining whether to use streaming or non-streaming response handling based on the request payload. For streaming requests, it configures additional options to track token usage.
Returns
Returns a Response containing either:
- A streaming SSE connection for real-time completions
- A single JSON response for non-streaming completions
Errors
Returns an error status code if:
- The request processing fails
- The streaming/non-streaming handlers encounter errors
- The underlying inference service returns an error
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
A list of messages comprising the conversation so far
ID of the model to use
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far
Controls how the model responds to function calls
A list of functions the model may generate JSON inputs for
Modify the likelihood of specified tokens appearing in the completion
The maximum number of tokens to generate in the chat completion
How many chat completion choices to generate for each input message
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far
The format to return the response in
If specified, our system will make a best effort to sample deterministically
Up to 4 sequences where the API will stop generating further tokens
Whether to stream back partial progress. Must be false for this request type.
What sampling temperature to use, between 0 and 2
Controls which (if any) tool the model should use
A list of tools the model may call
An alternative to sampling with temperature
A unique identifier representing your end-user
Response
A list of chat completion choices.
The Unix timestamp (in seconds) of when the chat completion was created.
A unique identifier for the chat completion.
The model used for the chat completion.
The system fingerprint for the completion, if applicable.
Usage statistics for the completion request.