LLM Access

Use BlueNexus's OpenAI-compatible LLM API for direct chat completions without agents.

Required scope: llm-all

Chat Completions

curl -X POST https://api.bluenexus.ai/api/v1/chat/completions \
  -H "Authorization: Bearer ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bluenexus/glm-4.7-flash-tee",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain MCP in one paragraph."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Response (OpenAI-compatible):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1713000000,
  "model": "bluenexus/glm-4.7-flash-tee",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "MCP (Model Context Protocol) is an open protocol..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 80,
    "total_tokens": 105
  }
}

Response Headers

Header	Description
`X-Credits-Consumed`	Credits used
`X-Credits-Remaining`	Remaining balance

Streaming

curl -X POST https://api.bluenexus.ai/api/v1/chat/completions \
  -H "Authorization: Bearer ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bluenexus/glm-4.7-flash-tee",
    "messages": [
      {"role": "user", "content": "Write a haiku about APIs"}
    ],
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Endpoints"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" await"}}]}

data: [DONE]

List Available Models

curl https://api.bluenexus.ai/api/v1/models \
  -H "Authorization: Bearer ACCESS_TOKEN"

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID
`messages`	array	Yes	Conversation messages
`temperature`	number	No	0-2 (default: 1)
`max_tokens`	number	No	Max response tokens
`stream`	boolean	No	Enable SSE streaming
`top_p`	number	No	0-1 (default: 1)
`stop`	string/array	No	Stop sequences
`tools`	array	No	Function definitions
`tool_choice`	string	No	`"auto"`, `"none"`, or specific function
`response_format`	object	No	`text`, `json_object`, or `json_schema`
`seed`	number	No	Deterministic seed

Using with OpenAI SDKs

Since the API is OpenAI-compatible, you can use existing OpenAI client libraries:

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_BLUENEXUS_TOKEN",
    base_url="https://api.bluenexus.ai/api/v1"
)

response = client.chat.completions.create(
    model="bluenexus/glm-4.7-flash-tee",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_BLUENEXUS_TOKEN",
  baseURL: "https://api.bluenexus.ai/api/v1",
});

const response = await client.chat.completions.create({
  model: "bluenexus/glm-4.7-flash-tee",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

Credit Consumption

LLM usage is billed in credits based on token consumption. The exact rate depends on the model tier. Credit consumption is returned in the X-Credits-Consumed response header.

Rate Limit

30 requests per minute. Returns 429 Too Many Requests if exceeded.

Next Steps

Agent Chat Completions — Chat with agents that have tools and context
Pricing & Credits — Credit system details