Creates a model response for the given chat conversation.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
The name of the model to query.
meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Meta-Llama-3.1-70B-Instruct, meta-llama/Meta-Llama-3.1-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct, nvidia/Llama-3.1-Nemotron-70B-Instruct, google/gemma-2-27b-it, google/gemma-2-9b-it, Qwen/Qwen2.5-72B-Instruct, Qwen/Qwen2.5-Coder-32B-Instruct, Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2-72B-Instruct, Gryphe/MythoMax-L2-13b, gpt-4o-2024-11-20, gpt-4o-2024-08-06, gpt-4o-mini, o1-preview, o1-mini, claude-3-5-sonnet-v2@20241022, claude-3-5-sonnet@20240620 "meta-llama/Meta-Llama-3.1-8B-Instruct"
A list of messages comprising the conversation so far.
1 - 10 elementsIf set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]
false
The maximum number of tokens to generate.
1 <= x <= 4096512
A list of string sequences that will truncate (stop) inference text output.
Determines the degree of randomness in the response.
0.7
The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.
0.7
50
0.5
Number of generations to return
1