LLM Access
Use BlueNexus's OpenAI-compatible LLM API for direct chat completions without agents.
Required scope: llm-all
Chat Completions
curl -X POST https://api.bluenexus.ai/api/v1/chat/completions \
-H "Authorization: Bearer ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "bluenexus/glm-4.7-flash-tee",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain MCP in one paragraph."}
],
"temperature": 0.7,
"max_tokens": 500
}'
Response (OpenAI-compatible):
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1713000000,
"model": "bluenexus/glm-4.7-flash-tee",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "MCP (Model Context Protocol) is an open protocol..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 80,
"total_tokens": 105
}
}
Response Headers
| Header | Description |
|---|---|
X-Credits-Consumed |
Credits used |
X-Credits-Remaining |
Remaining balance |
Streaming
curl -X POST https://api.bluenexus.ai/api/v1/chat/completions \
-H "Authorization: Bearer ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "bluenexus/glm-4.7-flash-tee",
"messages": [
{"role": "user", "content": "Write a haiku about APIs"}
],
"stream": true
}'
Response (Server-Sent Events):
data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Endpoints"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" await"}}]}
data: [DONE]
List Available Models
curl https://api.bluenexus.ai/api/v1/models \
-H "Authorization: Bearer ACCESS_TOKEN"
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model ID |
messages |
array | Yes | Conversation messages |
temperature |
number | No | 0-2 (default: 1) |
max_tokens |
number | No | Max response tokens |
stream |
boolean | No | Enable SSE streaming |
top_p |
number | No | 0-1 (default: 1) |
stop |
string/array | No | Stop sequences |
tools |
array | No | Function definitions |
tool_choice |
string | No | "auto", "none", or specific function |
response_format |
object | No | text, json_object, or json_schema |
seed |
number | No | Deterministic seed |
Using with OpenAI SDKs
Since the API is OpenAI-compatible, you can use existing OpenAI client libraries:
Python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_BLUENEXUS_TOKEN",
base_url="https://api.bluenexus.ai/api/v1"
)
response = client.chat.completions.create(
model="bluenexus/glm-4.7-flash-tee",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
TypeScript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_BLUENEXUS_TOKEN",
baseURL: "https://api.bluenexus.ai/api/v1",
});
const response = await client.chat.completions.create({
model: "bluenexus/glm-4.7-flash-tee",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
Credit Consumption
LLM usage is billed in credits based on token consumption. The exact rate depends on the model tier. Credit consumption is returned in the X-Credits-Consumed response header.
Rate Limit
30 requests per minute. Returns 429 Too Many Requests if exceeded.
Next Steps
- Agent Chat Completions — Chat with agents that have tools and context
- Pricing & Credits — Credit system details