Chat Completions
Chat Completions
Creates a model response for the given chat conversation.
POST
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
application/json
A list of messages comprising the conversation so far.
The name of the model to query.
Available options:
meta-llama/Meta-Llama-3.1-405B-Instruct
, meta-llama/Meta-Llama-3.1-70B-Instruct
, meta-llama/Meta-Llama-3.1-8B-Instruct
, meta-llama/Llama-3.2-1B-Instruct
, meta-llama/Llama-3.2-3B-Instruct
, nvidia/Llama-3.1-Nemotron-70B-Instruct
, google/gemma-2-27b-it
, google/gemma-2-9b-it
, Qwen/Qwen2.5-72B-Instruct
, Qwen/Qwen2.5-Coder-32B-Instruct
, Qwen/Qwen2.5-7B-Instruct
, Qwen/Qwen2-72B-Instruct
, Gryphe/MythoMax-L2-13b
, gpt-4o-2024-11-20
, gpt-4o-2024-08-06
, gpt-4o-mini
, o1-preview
, o1-mini
, claude-3-5-sonnet-v2@20241022
, claude-3-5-sonnet@20240620
The maximum number of tokens to generate.
Required range:
1 < x < 4096
Number of generations to return
A list of string sequences that will truncate (stop) inference text output.
If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]
Determines the degree of randomness in the response.
The top_p
(nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.