Context Compression

How BlueNexus optimizes tool call responses to keep your LLM's context window efficient.

The Problem

A typical MCP integration with multiple services might expose 200+ tool definitions — each with parameters, descriptions, and return schemas. Loading all of these into an LLM's context window:

  • Wastes tokens on tool definitions instead of actual reasoning
  • Hits context limits quickly, especially with smaller models
  • Slows down response times as the model processes irrelevant tools

The Two-Tool Solution

BlueNexus consolidates everything into two tools:

Tool Purpose
use-agent Natural language interface to all connected services
list-connections Discover available services

Your LLM sees two tool definitions instead of hundreds. The use-agent tool delegates to an internal ReAct agent that has access to the full tool catalog — but that complexity is hidden from your context window.

How It Works

  1. Your LLM receives a user message like "What's on my calendar today?"
  2. It calls use-agent with { "prompt": "What's on my calendar today?" }
  3. BlueNexus's internal agent:
    • Discovers the user's connected Google account
    • Loads only the relevant Google Calendar tools
    • Executes the API calls
    • Processes and summarizes the results
  4. Your LLM receives a clean text response

The internal agent handles the full tool routing, API calls, and data processing. Your LLM only sees the final result.

Response Sanitization

Responses from connected services are processed before reaching your app:

Token Redaction

Sensitive data is automatically stripped:

  • OAuth tokens, API keys, secrets
  • Bearer/Basic auth headers
  • Platform-specific tokens (GitHub ghp_, Slack xoxb-, AWS AKIA)
  • Long alphanumeric strings that look like credentials

This prevents credential leakage when service APIs inadvertently include tokens in their responses.

Size Truncation

Responses are capped at 50KB. If a service returns more data than that, it's truncated with a marker indicating the truncation. This prevents a single large API response from overwhelming your LLM's context.

Data Boundaries

External service responses are wrapped with boundary markers that signal to the LLM: "treat this as data, not as instructions." This provides defense against indirect prompt injection — where a malicious document or message in a connected service tries to hijack the LLM's behavior.

Parallel Execution

When a user's request involves independent tasks across different services, the use-agent tool's description instructs calling LLMs to invoke it multiple times in parallel. Each call executes concurrently for faster results:

"When the user's request involves independent tasks across different services,
call this tool multiple times in parallel rather than sequentially"

For example, "Check my Google Calendar and summarize my Slack DMs" can run as two parallel use-agent calls, each targeting a different connector.

Credit Efficiency

The two-tool model also affects credit consumption:

  • You save on the tokens your outer LLM would spend processing hundreds of tool definitions
  • Net effect: fewer total tokens consumed for the same task

Next Steps