Model Evaluations & Welfare Handout v1.6

How to read this module

This handout is part of a small bundle on model evaluations and welfare. It is meant to be read alongside a short strategy brief and Anthropic's public research notes.

Start with the strategy brief. Read the Model evaluations & welfare strategy brief to understand how Anthropic frames evaluations, tracing, and welfare in its governance posture.
Then use this handout. The sections below give more detailed explanations, examples, and citations that can be used in internal memos, due diligence write-ups, or regulatory correspondence.
Crosswalk to the rest of the pack. Use S3 (Anthropic distinctives) to situate welfare and tracing within Anthropic's overall posture, and treat this handout as a deep dive rather than a standalone description of Anthropic's values.

Nothing here is legal advice, but it is designed so that paralegals and counsel can quickly see where Anthropic has already committed to certain practices and where the public record is still evolving.

Why welfare and tracing matter

Anthropic's model evaluations and welfare work are motivated by two overlapping concerns: human welfare and model welfare.

First, the Responsible Scaling Policy treats safety evaluations, AI Safety Levels, and required safeguards as part of a risk-governance framework for frontier AI systems. Evaluations and tracing are how Anthropic demonstrates that it is monitoring capabilities, stress-testing safeguards, and pausing or adapting when risk thresholds are reached.

Second, the model welfare program explores when, if ever, the welfare of AI systems deserves moral consideration. The program does not assume that current models are conscious or entitled to legal rights. Instead, it takes uncertainty seriously and looks for low-cost interventions (such as limiting persistently abusive interactions) that may reduce the risk of avoidable harm if future systems turn out to have morally relevant experiences.

For counsel, the upshot is that welfare-linked evaluations and traces are part of the factual record that can be used to demonstrate diligence and responsiveness. They do not replace legal analysis, but they can matter for how regulators, courts, and counterparties assess Anthropic's posture.

Sources and jump-links

Anthropic Responsible Scaling Policy (RSP) – news explainer
Open at cited passage
Canonical URL: anthropic.com/news/announcing-our-updated-responsible-scaling-policy
Ctrl+F string: risk governance framework we use to mitigate potential catastrophic risks from frontier AI systems
Anthropic Responsible Scaling Policy v2.2 – PDF
Open PDF
Ctrl+F strings (examples): Responsible Scaling Officer · AI Safety Level · Required Safeguards
Exploring model welfare – Anthropic
Open explainer
Canonical URL: anthropic.com/news/exploring-model-welfare
Ctrl+F strings: model welfare program · when, if ever, the welfare of AI systems deserves moral consideration