POST
/
chat
/
completions

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
enum<string>
default: meta-llama/Meta-Llama-3.1-8B-Instructrequired

The name of the model to query.

Available options:
meta-llama/Llama-3.2-1B-Instruct,
meta-llama/Llama-3.2-3B-Instruct,
meta-llama/Meta-Llama-3.1-8B-Instruct,
meta-llama/Meta-Llama-3.1-70B-Instruct,
google/gemma-2-9b-it,
google/gemma-2-27b-it,
Qwen/Qwen2-72B-Instruct,
Qwen/Qwen2-7B-Instruct,
Gryphe/MythoMax-L2-13b,
gpt-4o-2024-08-06,
gpt-4o-mini,
gpt-4-turbo-2024-04-09,
gpt-3.5-turbo-0125,
claude-3-5-sonnet@20240620,
openai-o1-mini,
openai-o1-preview
messages
object[]
required

A list of messages comprising the conversation so far.

stream
boolean
default: false

If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]

max_tokens
integer
default: 512

The maximum number of tokens to generate.

stop
string[]

A list of string sequences that will truncate (stop) inference text output.

temperature
number
default: 0.7

Determines the degree of randomness in the response.

top_p
number
default: 0.7

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.

top_k
number
default: 50
frequency_penalty
number
default: 0.5
n
integer
default: 1

Number of generations to return

Response

200 - application/json
id
string
choices
object[]
usage
object
created
integer
model
string
object
enum<string>
Available options:
chat.completion