LLM Services

List available LLM models

get

List Available Models

Retrieves a list of available LLM models based on your client's scope permissions.

Scope Requirements

  • llm-confidential: Access to confidential models only (ex: phala/*)

  • llm-other: Access to non-confidential models only

  • llm-all: Access to all models

Authorizations
AuthorizationstringRequired

Enter JWT access token

Responses
get
/api/v1/models

Create chat completion

post

Create Chat Completion

Generates a chat completion using the specified LLM model. Supports both streaming and non-streaming responses.

OpenAI Compatibility

This endpoint is fully compatible with OpenAI's chat completions API, allowing you to use existing OpenAI client libraries by simply changing the base URL.

Streaming Support

Set stream: true to receive Server-Sent Events for real-time response streaming.

Model Access Control

  • Confidential models (phala/*): Require llm-confidential or llm-all scope

  • Non-confidential models: Require llm-other or llm-all scope

Usage Tracking

All requests are automatically tracked for billing and monitoring purposes, including:

  • Token usage (prompt, completion, total)

  • Model used and confidential status

  • Request metadata and timing

Authorizations
AuthorizationstringRequired

Enter JWT access token

Body

Request body for creating a chat completion

modelstringRequired

ID of the model to use

Example: phala/gpt-oss-120b
max_tokensnumber · min: 1Optional

The maximum number of tokens to generate

Example: 1000
temperaturenumber · max: 2Optional

Sampling temperature between 0 and 2. Higher values make output more random

Default: 1Example: 0.7
top_pnumber · max: 1Optional

Nucleus sampling parameter. Alternative to temperature

Default: 1Example: 0.9
nnumber · min: 1Optional

Number of chat completion choices to generate

Default: 1Example: 1
streambooleanOptional

Whether to stream back partial message deltas

Default: falseExample: false
stopone ofOptional

Up to 4 sequences where the API will stop generating further tokens

stringOptionalExample:
or
string[]OptionalExample: ["\n","."]
presence_penaltynumber · min: -2 · max: 2Optional

Presence penalty between -2.0 and 2.0

Default: 0Example: 0
frequency_penaltynumber · min: -2 · max: 2Optional

Frequency penalty between -2.0 and 2.0

Default: 0Example: 0
userstringOptional

Unique identifier representing your end-user

Example: user-123
seednumberOptional

Random seed for deterministic outputs

Example: 42
toolsobject[]Optional

A list of tools the model may call

tool_choiceone ofOptional

Controls which (if any) function is called by the model

Example: auto
string · enumOptionalPossible values:
or
Responses
post
/api/v1/chat/completions

Last updated