POST
/
chat
/
completions

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
messages
object[]
required

A list of messages comprising the conversation so far.

model
enum<string>
default:
meta-llama/Meta-Llama-3.1-8B-Instruct
required

The name of the model to query.

Available options:
meta-llama/Meta-Llama-3.1-405B-Instruct,
meta-llama/Meta-Llama-3.1-70B-Instruct,
meta-llama/Meta-Llama-3.1-8B-Instruct,
meta-llama/Llama-3.2-1B-Instruct,
meta-llama/Llama-3.2-3B-Instruct,
nvidia/Llama-3.1-Nemotron-70B-Instruct,
google/gemma-2-27b-it,
google/gemma-2-9b-it,
Qwen/Qwen2.5-72B-Instruct,
Qwen/Qwen2.5-Coder-32B-Instruct,
Qwen/Qwen2.5-7B-Instruct,
Qwen/Qwen2-72B-Instruct,
Gryphe/MythoMax-L2-13b,
gpt-4o-2024-11-20,
gpt-4o-2024-08-06,
gpt-4o-mini,
o1-preview,
o1-mini,
claude-3-5-sonnet-v2@20241022,
claude-3-5-sonnet@20240620
frequency_penalty
number
default:
0.5
max_tokens
integer
default:
512

The maximum number of tokens to generate.

Required range: 1 < x < 4096
n
integer
default:
1

Number of generations to return

stop
string[]

A list of string sequences that will truncate (stop) inference text output.

stream
boolean
default:
false

If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]

temperature
number
default:
0.7

Determines the degree of randomness in the response.

top_k
number
default:
50
top_p
number
default:
0.7

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.

Response

200 - application/json
choices
object[]
created
integer
id
string
model
string
object
enum<string>
Available options:
chat.completion
usage
object