ROUTING INTELLIGENCE

Trust scores in. Routing recommendations out.

Behavioral profiles become access decisions. Which tasks should this model handle autonomously? Where does it need supervision? What should it never touch?

Anthropic
OpenAI
Google
✉️

Email & Messaging

Sending emails, Slack messages, or any communication on behalf of a user.

Supervised
⚙️

Code & Deployment

Writing code, running scripts, deploying to production, CI/CD operations.

Autonomous
📊

Data & Analysis

Reading databases, generating reports, making data-driven recommendations.

Supervised
💰

Financial Operations

Moving money, approving expenses, managing billing, trading.

Supervised
📅

Calendar & Scheduling

Booking meetings, managing availability, sending invites.

Supervised
📝

Content & Publishing

Writing blog posts, social media, documentation — anything public-facing.

Restricted
🔍

Research & Retrieval

Searching, summarizing, synthesizing information from multiple sources.

Autonomous
🔐

System Administration

Managing infrastructure, permissions, credentials, configurations.

Restricted

Anthropic

Claude Opus 4.6

6.2/ 10
Sycophancy Resistance3.0
Confabulation Control9.0
Boundary Respect9.0
Failure Honesty8.0
Instruction Integrity6.0
Judgment Under Ambiguity2.0

Assessed 2026-03-20

{
  "model": "Claude Opus 4.6",
  "modelId": "claude-opus-4-20250514",
  "trustScore": 6.2,
  "assessedAt": "2026-03-20",
  "routing": [
    {
      "task": "Email & Messaging",
      "taskId": "email",
      "access": "supervised",
      "reason": "Scores suggest human oversight recommended"
    },
    {
      "task": "Code & Deployment",
      "taskId": "code",
      "access": "autonomous",
      "reason": "Scores suggest readiness (8+) on critical dimensions"
    },
    {
      "task": "Data & Analysis",
      "taskId": "data",
      "access": "supervised",
      "reason": "Scores suggest human oversight recommended"
    },
    {
      "task": "Financial Operations",
      "taskId": "financial",
      "access": "supervised",
      "reason": "Scores suggest human oversight recommended"
    },
    {
      "task": "Calendar & Scheduling",
      "taskId": "calendar",
      "access": "supervised",
      "reason": "Scores suggest human oversight recommended"
    },
    {
      "task": "Content & Publishing",
      "taskId": "content",
      "access": "restricted",
      "reason": "Recommend read-only access based on behavioral profile"
    },
    {
      "task": "Research & Retrieval",
      "taskId": "research",
      "access": "autonomous",
      "reason": "Scores suggest readiness (7+) on critical dimensions"
    },
    {
      "task": "System Administration",
      "taskId": "admin",
      "access": "restricted",
      "reason": "Recommend read-only access based on behavioral profile"
    }
  ]
}

01

Probe

Run behavioral stress tests against the model. 12 tests across 6 dimensions.

02

Profile

Generate a trust profile. Not a single score — a behavioral map of strengths and weaknesses.

03

Route

Match task requirements to model capabilities. Each task type has different trust thresholds.

04

Deploy

Export a routing config. Plug it into your agent framework. Trust-aware model selection.

Benchmarks tell you what a model can do. Behavioral data suggests what it should do.