Chat Completions
Chat Completions
Creates a model response for the given chat conversation.
POST
/
chat
/
completions
Authorizations
Authorization
string
headerrequiredBearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
application/json
model
enum<string>
default: meta-llama/Meta-Llama-3.1-8B-InstructrequiredThe name of the model to query.
Available options:
meta-llama/Llama-3.2-1B-Instruct
, meta-llama/Llama-3.2-3B-Instruct
, meta-llama/Meta-Llama-3.1-8B-Instruct
, meta-llama/Meta-Llama-3.1-70B-Instruct
, google/gemma-2-9b-it
, google/gemma-2-27b-it
, Qwen/Qwen2-72B-Instruct
, Qwen/Qwen2-7B-Instruct
, Gryphe/MythoMax-L2-13b
, gpt-4o-2024-08-06
, gpt-4o-mini
, gpt-4-turbo-2024-04-09
, gpt-3.5-turbo-0125
, claude-3-5-sonnet@20240620
, openai-o1-mini
, openai-o1-preview
messages
object[]
requiredA list of messages comprising the conversation so far.
stream
boolean
default: falseIf set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]
max_tokens
integer
default: 512The maximum number of tokens to generate.
stop
string[]
A list of string sequences that will truncate (stop) inference text output.
temperature
number
default: 0.7Determines the degree of randomness in the response.
top_p
number
default: 0.7The top_p
(nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.
top_k
number
default: 50frequency_penalty
number
default: 0.5n
integer
default: 1Number of generations to return
Response
200 - application/json
id
string
choices
object[]
usage
object
created
integer
model
string
object
enum<string>
Available options:
chat.completion