POST
/
chat
/
completions
curl --request POST \
  --url https://api.horay.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "Unlock your AI Creativity with Horay.ai'\''s Blazing Fast, Affordable and Production Ready API, What impact will it have on the industry?"
    }
  ],
  "stream": false,
  "max_tokens": 512,
  "stop": [
    "<string>"
  ],
  "temperature": 0.7,
  "top_p": 0.7,
  "top_k": 50,
  "frequency_penalty": 0.5,
  "n": 1
}'
{
  "id": "<string>",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "<string>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  },
  "created": 123,
  "model": "<string>",
  "object": "chat.completion"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
enum<string>
default:meta-llama/Meta-Llama-3.1-8B-Instruct
required

The name of the model to query.

Available options:
meta-llama/Meta-Llama-3.1-405B-Instruct,
meta-llama/Meta-Llama-3.1-70B-Instruct,
meta-llama/Meta-Llama-3.1-8B-Instruct,
meta-llama/Llama-3.2-1B-Instruct,
meta-llama/Llama-3.2-3B-Instruct,
nvidia/Llama-3.1-Nemotron-70B-Instruct,
google/gemma-2-27b-it,
google/gemma-2-9b-it,
Qwen/Qwen2.5-72B-Instruct,
Qwen/Qwen2.5-Coder-32B-Instruct,
Qwen/Qwen2.5-7B-Instruct,
Qwen/Qwen2-72B-Instruct,
Gryphe/MythoMax-L2-13b,
gpt-4o-2024-11-20,
gpt-4o-2024-08-06,
gpt-4o-mini,
o1-preview,
o1-mini,
claude-3-5-sonnet-v2@20241022,
claude-3-5-sonnet@20240620
Example:

"meta-llama/Meta-Llama-3.1-8B-Instruct"

messages
object[]
required

A list of messages comprising the conversation so far.

stream
boolean
default:false

If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]

Example:

false

max_tokens
integer
default:512

The maximum number of tokens to generate.

Required range: 1 <= x <= 4096
Example:

512

stop
string[]

A list of string sequences that will truncate (stop) inference text output.

temperature
number
default:0.7

Determines the degree of randomness in the response.

Example:

0.7

top_p
number
default:0.7

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.

Example:

0.7

top_k
number
default:50
Example:

50

frequency_penalty
number
default:0.5
Example:

0.5

n
integer
default:1

Number of generations to return

Example:

1

Response

200
application/json
200
id
string
choices
object[]
usage
object
created
integer
model
string
object
enum<string>
Available options:
chat.completion