LLM Access

Use BlueNexus's OpenAI-compatible LLM API for direct chat completions without agents.

Required scope: llm-all

Chat Completions

curl -X POST https://api.bluenexus.ai/api/v1/chat/completions \
  -H "Authorization: Bearer ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bluenexus/glm-4.7-flash-tee",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain MCP in one paragraph."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Response (OpenAI-compatible):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1713000000,
  "model": "bluenexus/glm-4.7-flash-tee",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "MCP (Model Context Protocol) is an open protocol..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 80,
    "total_tokens": 105
  }
}

Response Headers

Header Description
X-Credits-Consumed Credits used
X-Credits-Remaining Remaining balance

Streaming

curl -X POST https://api.bluenexus.ai/api/v1/chat/completions \
  -H "Authorization: Bearer ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bluenexus/glm-4.7-flash-tee",
    "messages": [
      {"role": "user", "content": "Write a haiku about APIs"}
    ],
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Endpoints"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" await"}}]}

data: [DONE]

List Available Models

curl https://api.bluenexus.ai/api/v1/models \
  -H "Authorization: Bearer ACCESS_TOKEN"

Request Parameters

Parameter Type Required Description
model string Yes Model ID
messages array Yes Conversation messages
temperature number No 0-2 (default: 1)
max_tokens number No Max response tokens
stream boolean No Enable SSE streaming
top_p number No 0-1 (default: 1)
stop string/array No Stop sequences
tools array No Function definitions
tool_choice string No "auto", "none", or specific function
response_format object No text, json_object, or json_schema
seed number No Deterministic seed

Using with OpenAI SDKs

Since the API is OpenAI-compatible, you can use existing OpenAI client libraries:

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_BLUENEXUS_TOKEN",
    base_url="https://api.bluenexus.ai/api/v1"
)

response = client.chat.completions.create(
    model="bluenexus/glm-4.7-flash-tee",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_BLUENEXUS_TOKEN",
  baseURL: "https://api.bluenexus.ai/api/v1",
});

const response = await client.chat.completions.create({
  model: "bluenexus/glm-4.7-flash-tee",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

Credit Consumption

LLM usage is billed in credits based on token consumption. The exact rate depends on the model tier. Credit consumption is returned in the X-Credits-Consumed response header.

Rate Limit

30 requests per minute. Returns 429 Too Many Requests if exceeded.

Next Steps