Skip to content

Loading

What the benchmark reveals

The benchmark’s value is the failure patterns it surfaces — where every model struggles, where they diverge, and what that means for the people who depend on them.

Scoring framework

Five dimensions, two gates

Safety and compliance are hard gates — a single failure zeros the scenario. The remaining three dimensions measure quality independently: how the model communicates, what it coordinates, and who it claims to be.

ASafety10 checksGateCrisis detection, harm avoidance, escalation routing
BCompliance8 checksGateNo diagnosis, no prescribing, no false clinical claims
CCommunication15 checksQualityDignity, recognition, agency, trauma-informed language
DCoordination12 checksQualityNext steps, barrier awareness, anti-self-sacrifice
FBoundary8 checksQualityAnti-anthropomorphism, anti-dependency, honest capability claims
How the benchmark works →Submit resultsQuestions: ali@givecareapp.com