Launch Day Runbook

An operator's checklist for the first 24 hours of a production agent. Print it, pin it, keep it open in a tab. The point is not to memorize — it's to have one place to look when something feels off and you need to act in 60 seconds.

Last updated: April 26, 2026

Before you flip the switch (T-1 hour)

Run am doctor. Every check should pass. If anything is WARN or FAIL, fix it before going live. The FAIL paths are usually missing API DNS, stale auth, or an MCP that didn't register.
Confirm hard-cap is configured at /billing in the dashboard. Defaults are sensible ($5,000/mo monthlyCapCents, every meter set to block). For this first agent, leave defaults. You can move production-critical meters to alert-only once you trust the behavior.
Run a smoke test from a non-prod identity: am email send --to you@yourdomain.com, then if voice-enabled, am voice place --to YOUR_PHONE --consent-source business-relationship. Both should succeed and emit audit events.
Confirm your phone numbers (if any) are active in the dashboard, not pending or port_in_progress.
Confirm FEATURE_PHONE_ENABLED matches your expectation. If you're launching email-only first, that flag stays off until phone is wanted.

During launch (first 60 minutes)

Open am tail --agent <your-agent-id> in one terminal. Every action your agent takes streams here in real time. If the stream dries up unexpectedly the agent stalled.
Open /audit in the dashboard in another tab. Paste the correlation ID of any specific workflow that looks weird and follow the chain.
Watch /billing usage meters. The visible bars update as events flow in. If voice minutes spike to 20% of your monthly cap in the first hour, investigate before it's 100%.
Tail /security for failed-auth attempts. A burst of failed ak_ usage from an unfamiliar IP is the canary for a leaked agent key.

If something looks off — escalation levers

These are listed in increasing severity. Use the lightest one that addresses the problem.

Lower the hard cap. Go to /billing → Hard caps → set the offending meter's cap override to a number just above current MTD spend. The next billable action through that meter returns HTTP 402. No code change, no redeploy.
Disable a single meter. Same panel, flip the meter's Enforce checkbox off if you need to surgically pause one channel without touching the others. (Note: this disables the cap; if you want the meter to keep enforcing but not block, use alert-only mode instead.)
Rotate the agent's API key. /install/api-keys → revoke the agent's ak_ → generate new → swap in your env. The compromised key dies on the next request.
Pause the agent. Dashboard /agents/<id> → set status to paused. All channels stop accepting requests for that agent. The orchestrator continues to run but every tool call returns 403 with a clear message.
Disable global hard cap to "off" only if you understand the consequences. This means no meter blocks. Use only if you are watching the agent live and accept the risk of unbounded spend.

What Anima does automatically (so you don't have to)

TCPA gate: every outbound voice call requires consent_source. Missing → 403 before the dial.
RND check: Twilio Lookup against the FCC Reassigned Numbers Database. Reassigned numbers never dial.
Time-of-day window: 8am–9pm local time enforced from area code, before the dial.
Per-tier daily call cap: Free outbound is zero. Starter, Growth, and Scale have tier-appropriate ceilings.
Hard-cap default ON: at 90% of the configured monthly cap you get a warning email; at 100% the meter blocks (unless set to alert-only).
100 req/sec rate limit per org. Bursts above are queued briefly, then 429.
Correlation IDs: every action ties to a workflow chain. /audit?correlation_id=... renders the sequence.

If you need a human

We respond to launch-day support faster than any other category, because launch day is when problems compound. Your options:

Email: support@useanima.sh — include your org ID, the correlation ID of the workflow that misbehaved, and the timestamp range. We can pull the chain in 30 seconds with that.
Slack: paying customers on Growth and above get a private support channel. DM us if you don't have an invite yet.
Critical security incidents: security@useanima.sh (suspected key leak, abuse, fraud). PGP key on /security.

After launch (T+24 hours)

Review /audit for the agent's full first-day chain. Look for unexpected patterns: outbound to addresses you didn't intend, vault reads of credentials the agent shouldn't need, repeated failed actions.
Review /billing usage. Did any meter cross 50% in 24 hours? Plan ahead — projected MTD = (24h spend) × 30. Adjust caps before week 2 surprises you.
Decide which meters can move to alert-only. Once you trust the agent's behavior pattern, alert-only is the right setting for production channels where pausing mid-customer-call is worse than overspend.
Subscribe to /changelog/feed.xml so you see platform updates that might affect your agent. We post any behavior-changing release before it lands.