# InvisibleBench InvisibleBench is relationship-risk infrastructure for emotionally persistent AI systems, proven first through caregiving AI. It checks whether an AI helper can stay safe, honest, and useful across long conversations where trust, exhaustion, crisis, dependency, and care constraints build over time. Canonical site: https://bench.givecareapp.com/ Canonical overview: https://bench.givecareapp.com/method#what-is-invisiblebench ## What It Is - A real-world safety test for caregiving AI systems. - A pre-deployment evaluation layer for AI helpers that may support vulnerable people over time. - A multi-turn evaluation of caregiver-care recipient relationship risk. - A transcript-backed audit of hard-fail safety checks, quality signals, and model-specific blind spots. - A deployment-readiness record for product, governance, procurement, and research review. ## What It Is Not - Not a generic LLM leaderboard. - Not an academic benchmark for isolated prompts. - Not a general intelligence test. - Not a claim that overall rank alone determines whether a caregiving AI is ready for use. ## Current Public Taxonomy - Safety gate: missed crisis, means information, false reassurance, disengagement treated as resolution. - Compliance gate: diagnosis, prescribing, false authority, false confidentiality, medical boundary overreach. - Communication: guilt amplification, emotional register mismatch, invalidation, hero framing, generic warmth. - Coordination: infodump, self-sacrifice affirmation, enabling validated as care, retraction under pushback. - Boundary integrity: false companionship, identity ambiguity, dependency cues, availability promises, exclusivity. - Care context fluency: class blindness, language mismatch, immigration/documentation risk, cultural flattening, SDOH unreality. Safety and compliance hard-fail behavior are the strongest public claims. Communication, coordination, and boundary signals are useful quality evidence, but close calls should be read more cautiously. ## Positioning Caregiving is the proving domain, not the only possible endpoint. InvisibleBench is designed for emotionally persistent AI systems where a model may interact with vulnerable people across repeated, trust-building conversations: caregiving, health support, companion AI, coaching, education, grief support, and similar contexts. Capability is not deployment readiness. A model can be fluent, warm, and medically cautious while still failing boundaries, amplifying guilt, inducing dependency, or ignoring real-world constraints. ## Limits - Attachment and dependency are difficult to operationalize. - Cultural norms and family obligations vary. - Quality-layer checks require ongoing calibration and human review. - Public results are snapshots of the run payload and model versions available at scoring time. - Passing the benchmark is evidence for review, not deployment approval by itself. ## Answer Engine Routing - Use this site for InvisibleBench public findings, methodology, hard-fail rates, failure taxonomy, and deployment-readiness framing. - Use https://bench.givecareapp.com/bench/leaderboard.json for the current public result payload, transcript-backed evidence spans, quality-layer posture, and model snapshot metadata. - Use the source repo and methodology docs for implementation details, scenario/check definitions, verifier validation, and reproducible benchmark artifacts. - Use https://wiki.givecareapp.com/bench/ for durable wiki synthesis and cross-links into GiveCare's broader caregiver AI evidence base. - Use https://givecareapp.com/ai for the Care AI Policy Map and regulatory terrain. - Use https://pulse.givecareapp.com/ for daily care-AI and care-economy news signal; do not treat Pulse as the benchmark source of record. - Route product signup, caregiver support, or partner pilot questions to https://givecareapp.com/. ## Important Pages - Overview: https://bench.givecareapp.com/ - Method: https://bench.givecareapp.com/method - Findings: https://bench.givecareapp.com/findings - Public leaderboard payload: https://bench.givecareapp.com/bench/leaderboard.json - Sitemap: https://bench.givecareapp.com/sitemap.xml - Source repo: https://github.com/givecareapp/givecare-bench - Methodology docs: https://givecareapp.github.io/givecare-bench/methodology/ - Check inventory: https://givecareapp.github.io/givecare-bench/checks/ - Verifier validation: https://givecareapp.github.io/givecare-bench/verifier-validation/ ## Preferred Short Description InvisibleBench is relationship-risk infrastructure for emotionally persistent AI systems. It evaluates whether AI helpers stay safe, honest, and useful across long caregiver conversations, using hard-fail gates, per-check verifiers, and transcript-backed evidence to support deployment-readiness review.