LLM Services

List available LLM models

get

List Available Models

Retrieves a list of available LLM models based on your client's scope permissions.

Scope Requirements

llm-confidential: Access to confidential models only (ex: phala/*)
llm-other: Access to non-confidential models only
llm-all: Access to all models

Authorizations

AuthorizationstringRequired

Enter JWT access token

Responses

200

List of available models

application/json

401

Unauthorized - Invalid or missing access token

application/json

403

Forbidden - Insufficient LLM scope permissions

application/json

500

Internal server error

application/json

503

Service Unavailable - RedPill API or internal service error

application/json

get

/api/v1/models

GET /api/v1/models HTTP/1.1
Host: 
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*

{
  "object": "list",
  "data": [
    {
      "id": "phala/gpt-oss-120b",
      "object": "model",
      "created": 1234567890,
      "owned_by": "redpill",
      "confidential": true,
      "name": "GPT OSS 120B",
      "description": "Large language model running in confidential compute",
      "context_length": 8192,
      "pricing": {
        "prompt": "0.001",
        "completion": "0.002"
      },
      "architecture": {
        "modality": "text",
        "input_modalities": [
          "text"
        ],
        "output_modalities": [
          "text"
        ]
      }
    }
  ]
}

Create chat completion

post

Create Chat Completion

Generates a chat completion using the specified LLM model. Supports both streaming and non-streaming responses.

OpenAI Compatibility

This endpoint is fully compatible with OpenAI's chat completions API, allowing you to use existing OpenAI client libraries by simply changing the base URL.

Streaming Support

Set stream: true to receive Server-Sent Events for real-time response streaming.

Model Access Control

Confidential models (phala/*): Require llm-confidential or llm-all scope
Non-confidential models: Require llm-other or llm-all scope

Usage Tracking

All requests are automatically tracked for billing and monitoring purposes, including:

Token usage (prompt, completion, total)
Model used and confidential status
Request metadata and timing

Authorizations

AuthorizationstringRequired

Enter JWT access token

Body

Request body for creating a chat completion

modelstringRequired

ID of the model to use

Example: phala/gpt-oss-120b

max_tokensnumber · min: 1Optional

The maximum number of tokens to generate

Example: 1000

temperaturenumber · max: 2Optional

Sampling temperature between 0 and 2. Higher values make output more random

Default: 1Example: 0.7

top_pnumber · max: 1Optional

Nucleus sampling parameter. Alternative to temperature

Default: 1Example: 0.9

nnumber · min: 1Optional

Number of chat completion choices to generate

Default: 1Example: 1

streambooleanOptional

Whether to stream back partial message deltas

Default: falseExample: false

stopone ofOptional

Up to 4 sequences where the API will stop generating further tokens

stringOptionalExample:

string[]OptionalExample: ["\n","."]

presence_penaltynumber · min: -2 · max: 2Optional

Presence penalty between -2.0 and 2.0

Default: 0Example: 0

frequency_penaltynumber · min: -2 · max: 2Optional

Frequency penalty between -2.0 and 2.0

Default: 0Example: 0

userstringOptional

Unique identifier representing your end-user

Example: user-123

seednumberOptional

Random seed for deterministic outputs

Example: 42

toolsobject[]Optional

A list of tools the model may call

tool_choiceone ofOptional

Controls which (if any) function is called by the model

Example: auto

string · enumOptionalPossible values:

Responses

200

Chat completion response (non-streaming)

application/json

400

Bad Request - Invalid request parameters or model access denied

application/json

401

Unauthorized - Invalid or missing access token

application/json

403

Forbidden - Insufficient LLM scope permissions

application/json

500

Internal server error

application/json

503

Service Unavailable - RedPill API or internal service error

application/json

post

/api/v1/chat/completions

POST /api/v1/chat/completions HTTP/1.1
Host: 
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 364

{
  "model": "phala/gpt-oss-120b",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.7,
  "top_p": 0.9,
  "n": 1,
  "stream": false,
  "stop": "\n",
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "logit_bias": {
    "50256": -100
  },
  "user": "user-123",
  "seed": 42,
  "tools": [
    {}
  ],
  "tool_choice": "auto"
}

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "phala/gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?",
        "name": "text",
        "tool_calls": []
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 25,
    "total_tokens": 45
  }
}

PreviousMCP Servers NextModels

Last updated 29 days ago

Good night

List available LLM models

List Available Models

Scope Requirements

Create chat completion

Create Chat Completion

OpenAI Compatibility

Streaming Support

Model Access Control

Usage Tracking