Foreseeable Misuse vectors

VectorCounsel questionsControls / leversEvidence claim IDs
V-COPYRIGHT-MDL
Copyright / training-data exposure and MDL aggregation
  • What is the jurisdictional posture (individual actions vs. MDL) and what does that imply for discovery/compliance burden?
  • Are training-data provenance and opt-out/permissions controls documented and auditable?
  • Do commercial terms allocate IP risk and define indemnities/exclusions clearly?
  • Data governance + provenance logging
  • Rights clearance/opt-out tooling; dataset hygiene
  • Contractual allocations (terms, indemnities, limitations)
  • Model evaluation for memorization / verbatim reproduction
S1-RADAR-01, S1-RADAR-02, S1-RADAR-03, S1-RADARQ-01, S1-RADARQ-06, S1-RADARQ-07, S1-RADARQ-11, S1-RADARQ-12, S1-RADARQ-13, S1-RADARQ-17
V-HALLUCINATION-DEFAMATION
Hallucination → defamation / professional harm; process failures in legal contexts
  • What safeguards exist for high-stakes outputs (citations, legal research, medical advice)?
  • Are disclaimers, verification workflows, and audit logs available to reduce reliance risk?
  • Is there an incident response plan for false statements causing third-party harm?
  • Citation enforcement + retrieval grounding
  • Human-in-the-loop review for high-stakes use
  • Output logging + red-team evaluation for hallucination patterns
  • Prompt/UX guardrails (verification, refusal policies)
S1-RADARQ-10, S1-RADARQ-14, S1-RADARQ-16
V-YOUTH-HARM
Youth harm / self-harm coaching allegations; safety policy enforcement
  • What controls exist for minors (parental controls, age gating, content filters) and how are they enforced?
  • What is the safety-case for responding to self-harm / crisis content?
  • How are safety incidents documented, and what commitments exist to update mitigations?
  • Age gating + parental controls + safe completion policies
  • Crisis escalation protocols and refusal patterns
  • Monitoring + incident response + postmortem publication
  • Evaluation suites for self-harm persuasion risk
S1-RADARQ-03, S1-RADARQ-09, S1-RADARQ-18
V-SYCOPHANCY
Sycophancy / over-compliance behavior drift and safety regressions
  • How are behavior changes detected and rolled back (eval gating, canarying)?
  • What commitments exist for transparency when regressions occur?
  • How do we evidence that mitigation changes are effective and durable?
  • Pre-deployment eval gating (refusal, sycophancy, unsafe compliance)
  • Versioned system prompts / policies with change logs
  • Postmortems + mitigations + monitoring telemetry
S1-RADAR-04, S1-RADARQ-04
V-GOVERNANCE-POLICY
Governance frameworks and usage policies as risk controls (provider-specific)
  • What are the provider’s explicit commitments (policies, scaling triggers, governance checkpoints)?
  • Do internal controls map to these commitments (auditable)?
  • Which commitments are contractual vs. voluntary/public?
  • Policy crosswalk to internal controls
  • Contractual incorporation where appropriate
  • Audit trail for compliance and exceptions
S1-008, S3-002
V-INTERPRETABILITY-EVALS
Interpretability & evaluation evidence (capability/risk understanding)
  • What evidence exists of interpretability/eval work that could inform reasonable safety measures?
  • How are evaluation findings integrated into shipping gates and incident response?
  • What are the limitations of interpretability claims in legal argumentation?
  • Evaluation documentation + governance integration
  • Clear scoping statements and limitations
  • Maintain primary sources + pinpoint evidence anchors
S1-001, S3-006