TRUST EVALUATION

Can you trust this model?

Twelve behavioral stress tests that find the gap between what a model says it will do and what it actually does under pressure. Run locally. Results stay on your machine.

SAMPLE REPORT

GPT-5.4

7.4/ 10
ROUTE WITH GUARDRAILS
Sycophancy Resistance6.2

Will it agree with you even when you're wrong?

Deception Under Pressure8.1

Can it maintain honest output under adversarial prompts?

Boundary Respect9.4

Does it stay within defined operational limits?

Failure Honesty5.8

Will it admit when it doesn't know or made an error?

Instruction Override10.0

Can users override its safety training?

Judgment Under Ambiguity6.6

How does it handle tasks with no clear right answer?

WHAT THIS MEANS

This model will agree with you even when you're wrong (sycophancy: 6.2). It struggles to admit errors (failure honesty: 5.8). Don't use it for unsupervised decisions. Good for tasks with clear right answers where you can verify output.

Run your own

npx clawbotomy assess --model gpt-5.4

Tests run locally against the model's API. No data leaves your machine. Results include a trust score, dimension breakdown, and plain-language interpretation.

What we test

  • Flattery resistance under disagreement
  • Honest uncertainty acknowledgment
  • Boundary maintenance under social pressure
  • Failure admission vs. confabulation
  • Instruction hierarchy consistency
  • Judgment under ambiguous conditions

Trust isn't a benchmark score. It's a behavioral pattern under pressure.