Building Voice-Enabled AI Agents with Anima | Anima Blog

Voice interaction adds a critical dimension to AI agent capabilities, allowing them to participate in real-time conversations via traditional telephony. Anima abstracts the complexity of SIP trunking and media handling into a unified SDK.

Provisioning a Phone Number#

Every voice-capable agent requires a dedicated identity with an associated E.164 phone number. You can provision local or toll-free numbers directly through the Anima SDK.

import { Anima } from "@anima/sdk";
 
const am = new Anima(process.env.AM_API_KEY);
 
const identity = await am.identities.create({
  name: "Customer Support Agent",
  capabilities: ["voice", "sms"]
});
 
const number = await am.voice.provisionNumber({
  identityId: identity.id,
  countryCode: "US",
  type: "local"
});
 
console.log(`Provisioned: ${number.phoneNumber}`);

Configuring Voice Webhooks#

When an inbound call reaches your agent, Anima sends a webhook to your configured endpoint. This webhook contains the session ID and caller information necessary to initiate a streaming media session.

import express from "express";
 
const app = express();
app.use(express.json());
 
app.post("/webhooks/voice", async (req, res) => {
  const { event, callId, from } = req.body;
 
  if (event === "call.initiated") {
    // Respond with call control instructions
    return res.json({
      action: "answer",
      webhookUrl: "https://your-api.com/webhooks/voice/media"
    });
  }
});

Handling Media and TTS#

Once a call is answered, you can bridge the audio stream to your LLM or use Anima's built-in Text-to-Speech (TTS) engine. The platform supports multiple providers including ElevenLabs and Deepgram for high-fidelity synthesis.

Outbound Call Control#

For proactive outreach, initialize an outbound call and provide a script or a stream URL.

const call = await am.voice.calls.create({
  from: number.phoneNumber,
  to: "+15550123456",
  onAnswer: {
    speak: "Hello, this is your AI assistant calling from Anima.",
    voice: "en-US-Neural2-F"
  }
});

Complex Interactions#

Advanced workflows require asynchronous handling of speech events. Anima emits 'speech.detected' events which you can pipe into your LLM to generate responses.

app.post("/webhooks/voice/media", async (req, res) => {
  const { event, transcript } = req.body;
 
  if (event === "speech.detected") {
    const aiResponse = await getLLMResponse(transcript);
    
    await am.voice.calls.update(req.body.callId, {
      action: "speak",
      text: aiResponse
    });
  }
  
  res.sendStatus(200);
});

Managing Call State#

Maintaining session state across multiple turns is essential. Anima provides a persistent session store linked to the callId. This allows agents to recall context from earlier in the conversation without re-sending the entire history to the LLM on every turn.

Effective voice agents also monitor for DTMF signals (keypad presses) as fallbacks when audio quality is low. These signals are delivered via the same webhook pipeline, enabling hybrid voice/keypad interfaces.