This handout is part of a small bundle on model evaluations and welfare. It is meant to be read alongside a short strategy brief and Anthropic's public research notes.
Nothing here is legal advice, but it is designed so that paralegals and counsel can quickly see where Anthropic has already committed to certain practices and where the public record is still evolving.
Anthropic's model evaluations and welfare work are motivated by two overlapping concerns: human welfare and model welfare.
First, the Responsible Scaling Policy treats safety evaluations, AI Safety Levels, and required safeguards as part of a risk-governance framework for frontier AI systems. Evaluations and tracing are how Anthropic demonstrates that it is monitoring capabilities, stress-testing safeguards, and pausing or adapting when risk thresholds are reached.
Second, the model welfare program explores when, if ever, the welfare of AI systems deserves moral consideration. The program does not assume that current models are conscious or entitled to legal rights. Instead, it takes uncertainty seriously and looks for low-cost interventions (such as limiting persistently abusive interactions) that may reduce the risk of avoidable harm if future systems turn out to have morally relevant experiences.
For counsel, the upshot is that welfare-linked evaluations and traces are part of the factual record that can be used to demonstrate diligence and responsiveness. They do not replace legal analysis, but they can matter for how regulators, courts, and counterparties assess Anthropic's posture.
risk governance framework we use to mitigate potential catastrophic risks from frontier AI systems
Responsible Scaling Officer ·
AI Safety Level ·
Required Safeguards
model welfare program ·
when, if ever, the welfare of AI systems deserves moral consideration