- Overview
- Prerequisites
- Deployment Steps
- Deployment Validation
- Running the Guidance
- Next Steps
- Cleanup
- FAQ, Known Issues, Additional Considerations, and Limitations
- Revisions
- Notices
- Authors
This Guidance provides a sample foundation for building real-time voice AI agents on AWS. It demonstrates how to build a voice assistant that handles phone calls over SIP/PSTN and voice interactions in web and mobile applications via WebRTC.
- Flexible orchestration -- Uses Pipecat, an open-source framework for voice AI pipelines
- Plug-in models -- Supports automatic speech recognition (ASR) or speech-to-text (STT), text-to-speech (TTS), and large language model (LLM) providers
- Phone and web -- Accepts phone calls via Daily SIP and public switched telephone network (PSTN) dial-in, and web applications through Daily managed WebRTC
- Local prototyping -- Test the full voice pipeline from your browser via WebRTC without any Daily.co account, phone number, or SIP infrastructure
- Extensible agents -- Extends capabilities through an agent-to-agent (A2A) hub-and-spoke architecture with AWS Cloud Map discovery
- AWS infrastructure -- Runs on Amazon ECS Fargate with auto-scaling, Amazon Bedrock for LLM, and optional self-hosted STT/TTS on Amazon SageMaker
Note: While Pipecat supports speech-to-speech models, this Guidance does not yet support them. Support speech-to-speech models is in our roadmap.
Detailed diagram
graph TB
Caller["Caller (Phone)"]
Daily["Daily.co (WebRTC + SIP)"]
APIGW["Amazon API Gateway"]
Lambda["AWS Lambda (BotRunner)"]
subgraph AWS["AWS Cloud"]
subgraph ECS["ECS Fargate"]
subgraph Pipeline["Pipecat Pipeline"]
Transport_In["Transport"] --> VAD["VAD"] --> STT["STT"]
STT --> LLM["LLM + Tools"]
LLM --> TTS["TTS"] --> Transport_Out["Transport"]
end
end
SM_STT["Amazon SageMaker
(STT Endpoint)
BiDi HTTP/2"]
Bedrock["Amazon Bedrock
(LLM + Tools)"]
SM_TTS["Amazon SageMaker
(TTS Endpoint)
BiDi HTTP/2"]
CloudMap["AWS Cloud Map
(A2A Discovery)"]
KB["KB Agent (A2A)"]
CRM["CRM Agent (A2A)"]
end
Caller <-->|PSTN| Daily
Daily -->|Webhook| APIGW --> Lambda
Lambda -->|POST /call| ECS
Daily <-->|WebRTC Audio| ECS
STT -.-> SM_STT
LLM -.-> Bedrock
TTS -.-> SM_TTS
Bedrock --> CloudMap
CloudMap --> KB
CloudMap --> CRM
The architecture flow:
- A caller dials the PSTN phone number, which connects to Daily.co via SIP.
- Daily.co sends a webhook to Amazon API Gateway, which triggers the BotRunner AWS Lambda function.
- The Lambda function sends a POST /call request to the Amazon ECS Fargate service to spawn a voice pipeline.
- The Pipecat pipeline processes audio in real-time: Transport receives audio, VAD detects speech, STT converts to text, the LLM (Amazon Bedrock) generates a response with optional tool calls, and TTS converts the response back to audio.
- STT and TTS can run on Amazon SageMaker endpoints (audio stays in Amazon VPC) or via cloud APIs.
- When tool calling is enabled, the LLM can invoke local tools (e.g., time, transfer) or discover and call remote A2A capability agents (Knowledge Base, CRM) via AWS Cloud Map.
You are responsible for the cost of the AWS services used while running this Guidance. As of March 2026, the cost for running this Guidance with the default settings in the US East (N. Virginia) Region is approximately $135-200 per month for Cloud API mode, or $935-1,200 per month for Amazon SageMaker mode (due to GPU instance costs).
We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.
The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.
Cloud API Mode:
| AWS Service | Dimensions | Cost [USD] |
|---|---|---|
| Amazon ECS Fargate | 1 task, 2 vCPU / 4 GB, always-on | ~$70/month |
| NAT Gateway | 1 gateway + data processing | ~$35/month |
| Network Load Balancer | 1 NLB, minimal LCUs | ~$18/month |
| AWS Lambda | ~10,000 invocations/month | ~$0.01/month |
| Amazon API Gateway | ~10,000 requests/month | ~$0.04/month |
| AWS Secrets Manager | 3 secrets | ~$1.20/month |
| Amazon CloudWatch | Logs, metrics, dashboard, alarms | ~$10-15/month |
| Amazon Bedrock | Pay-per-token LLM usage | ~$5-50/month |
Amazon SageMaker Mode (additional costs):
| AWS Service | Dimensions | Cost [USD] |
|---|---|---|
| Amazon SageMaker STT Endpoint | ml.g6.2xlarge, always-on | ~$350/month |
| Amazon SageMaker TTS Endpoint | ml.g6.12xlarge, always-on | ~$450/month |
Third-party service costs (Daily.co, Deepgram, Cartesia) vary by usage and are not included above. Refer to each provider's pricing page.
These deployment instructions are optimized to best work on macOS or Amazon Linux 2023 AMI. Deployment on Windows may require additional steps (e.g., WSL2).
Required tools:
- Node.js 18+ -- for CDK infrastructure deployment
- Python 3.12+ -- for voice agent development and testing
- AWS CLI v2 -- configured with credentials for your target account
- Finch (or Docker) -- for container image builds
# Verify installations
node --version # v18.x or higher
python3 --version # 3.12.x or higher
aws --version # aws-cli/2.x
finch --version # or docker --versionThe following third-party service accounts and API keys are required:
| Service | Purpose | Sign Up |
|---|---|---|
| Daily.co | WebRTC/SIP transport for voice calls | Dashboard |
| STT provider (e.g. Deepgram) | Speech-to-text (cloud API mode) | Provider console |
| TTS provider (e.g. Cartesia) | Text-to-speech (cloud API mode) | Provider console |
- Amazon Bedrock -- Model access must be enabled in your target Region. Go to the Amazon Bedrock console > Model access and enable your preferred LLM.
- VPC -- The deployment creates its own Amazon VPC. No existing VPC is required.
- IAM permissions -- The deploying principal needs permissions to create Amazon VPC, Amazon ECS, AWS Lambda, Amazon API Gateway, AWS Secrets Manager, AWS KMS, Amazon CloudWatch, and (optionally) Amazon SageMaker resources.
Amazon SageMaker mode only:
- GPU quota for
ml.g6.2xlargeandml.g6.12xlargein your target Region. Request via Service Quotas. - Deepgram Marketplace subscriptions for STT and TTS model packages.
This Guidance uses AWS CDK. If you are using AWS CDK for the first time in your AWS account/Region, run the bootstrap command:
npx cdk bootstrap aws://ACCOUNT_ID/REGION| Service | Limit | Default | Notes |
|---|---|---|---|
Amazon SageMaker ml.g6.2xlarge |
Endpoint instances | 0 | Request increase for Amazon SageMaker mode |
Amazon SageMaker ml.g6.12xlarge |
Endpoint instances | 0 | Request increase for Amazon SageMaker mode |
| Amazon ECS Fargate | On-demand vCPU | 256 | Sufficient for default configuration |
| Amazon Bedrock | Tokens per minute | Varies | Monitor throttling in Amazon CloudWatch |
Request service limit increases via the Service Quotas console.
This Guidance is best suited for AWS Regions that support Amazon Bedrock. Recommended Regions:
- US East (N. Virginia) --
us-east-1 - US West (Oregon) --
us-west-2
Amazon SageMaker mode additionally requires GPU instance availability (ml.g6 family) in the selected Region.
This project includes Claude Code skills that walk you through every step interactively -- checking prerequisites, configuring environment, deploying infrastructure, setting up your phone number, and verifying the result.
-
Clone the repository:
git clone https://github.com/aws-samples/sample-voice-agent.git cd sample-voice-agent -
Open the project in Claude Code (or your preferred AI-assisted IDE).
-
Deploy infrastructure -- Run
/deploy-cloud-api(or/deploy-sagemakerfor production). Claude checks prerequisites, gathers your API keys (Daily, STT/TTS providers), and deploys CDK stacks. Takes ~15 minutes. -
Set up a phone number -- Run
/configure-daily. Claude checks for existing Daily.co numbers (reuses one if available), configures the pinless dial-in webhook, and syncs secrets. You now have a callable phone number. -
Verify deployment -- Run
/verify-deploymentto health-check all infrastructure components.
| Skill | What It Does |
|---|---|
/deploy-cloud-api |
Full deployment using Deepgram + Cartesia cloud APIs |
/deploy-sagemaker |
Full deployment with self-hosted STT/TTS on Amazon SageMaker GPUs |
/configure-daily |
Set up a phone number and configure PSTN dial-in |
/verify-deployment |
Health check all infrastructure components |
/deploy-capability-agents |
Deploy Knowledge Base and/or CRM capability agents |
/create-capability-agent |
Scaffold a new A2A capability agent from scratch |
/create-local-tool |
Add a new tool to the voice pipeline |
/destroy-project |
Release phone number and tear down all AWS resources |
See the full Deployment Guide for step-by-step manual instructions.
-
Clone the repository:
git clone https://github.com/aws-samples/sample-voice-agent.git cd sample-voice-agent -
Navigate to the infrastructure directory and configure environment:
cd infrastructure cp .env.example .env # Edit .env with your AWS region and (for Amazon SageMaker mode) model package ARNs
-
Install dependencies:
npm install
-
Deploy the stacks:
# Deploy with cloud APIs (simpler, no Amazon SageMaker needed) USE_CLOUD_APIS=true ./deploy.sh deploy # Or deploy with Amazon SageMaker (production, audio stays in VPC) ./deploy.sh deploy
-
Configure API keys in AWS Secrets Manager:
./scripts/init-secrets.sh
-
Set up a phone number:
./scripts/setup-daily.sh
-
Capture the deployed resource outputs:
aws cloudformation describe-stacks --stack-name VoiceAgentEcs --query "Stacks[0].Outputs" --output table
After deployment, validate that all resources are running correctly:
-
Check CloudFormation stacks -- Open the AWS CloudFormation console and verify all stacks show
CREATE_COMPLETEstatus. -
Verify Amazon ECS service -- Confirm the voice agent task is running:
aws ecs describe-services --cluster voice-agent-cluster --services voice-agent-service --query "services[0].{status:status,running:runningCount,desired:desiredCount}" -
Test the webhook endpoint -- The Amazon API Gateway URL is available in CloudFormation outputs:
aws cloudformation describe-stacks --stack-name VoiceAgentBotRunner --query "Stacks[0].Outputs[?OutputKey=='WebhookUrl'].OutputValue" --output text -
Run the verification skill -- If using Claude Code, run
/verify-deploymentfor a comprehensive health check of all components including SSM parameters, Amazon ECS service, AWS Secrets Manager, webhook endpoint, and (optionally) Amazon SageMaker endpoints.
Once deployed, call your PSTN phone number to interact with the voice agent.
Test the full voice pipeline from your browser without any Daily.co account, phone number, or SIP infrastructure. Uses Pipecat's SmallWebRTCTransport with a prebuilt WebRTC browser UI.
Prerequisites: Cloud resources must be deployed first (Amazon Bedrock access, Deepgram/Cartesia API keys). Only the transport layer is local -- STT, LLM, and TTS still use cloud services.
cd backend/voice-agent
pip install -r requirements.txt
cp .env.example .env
# Edit .env: set DEEPGRAM_API_KEY, CARTESIA_API_KEY, AWS_REGION
python -m app.local_main
# Open http://localhost:7860 in your browser and click ConnectSpeak into your microphone and hear the agent respond through your speakers. Tool calling, filler phrases, and all pipeline features work identically to production. SIP-only tools (e.g., transfer_to_agent) are automatically excluded.
| Environment Variable | Default | Description |
|---|---|---|
LOCAL_PORT |
7860 |
Port for the local server |
SYSTEM_PROMPT |
Generic assistant | Custom system prompt |
ENABLE_TOOL_CALLING |
false |
Enable LLM tool calling |
ENABLE_FILLER_PHRASES |
true |
Enable filler phrases during tool delays |
The agent handles natural dialogue out of the box. Simply call and speak naturally.
If ENABLE_TOOL_CALLING is set to true (configurable via SSM parameter /voice-agent/config/enable-tool-calling):
| Test Phrase | Tool Tested | Expected Behavior |
|---|---|---|
| "What time is it?" | get_current_time |
Agent responds with the current time |
| "Goodbye" | hangup_call |
Agent says goodbye and ends the call |
| "Transfer me to an agent" | transfer_to_agent |
Call is transferred via SIP REFER (requires TRANSFER_DESTINATION) |
| Test Phrase | Agent | Expected Behavior |
|---|---|---|
| "What's your return policy?" | Knowledge Base | RAG search over uploaded documents |
| "Look up the account for 555-0100" | CRM | Customer lookup and account details |
The Amazon CloudWatch dashboard URL is available in CloudFormation outputs as VoiceAgentEcs.DashboardUrl. Key metrics to observe:
| Metric | Namespace | Target |
|---|---|---|
| E2ELatency | VoiceAgent/Pipeline |
< 2,000ms |
| AgentResponseLatency | VoiceAgent/Pipeline |
< 2,500ms |
| TurnCount | VoiceAgent/Pipeline |
Per call |
| InterruptionCount | VoiceAgent/Pipeline |
Per call |
| AudioRMS / AudioPeak | VoiceAgent/Pipeline |
Audio quality (dBFS) |
| ToolExecutionTime | VoiceAgent/Pipeline |
Per tool invocation |
| ActiveSessions | VoiceAgent/Pipeline |
Concurrent calls |
After successfully deploying and testing the basic voice agent, consider these enhancements:
- Add capability agents -- Run
/deploy-capability-agentsto deploy the Knowledge Base and CRM agents, extending the voice agent with RAG and customer data lookup without modifying the core pipeline. - Create custom tools -- Use
/create-local-toolto add new tools to the voice pipeline (e.g., appointment scheduling, order lookup). Tools use a capability-based registration system and require no pipeline code changes. - Create custom capability agents -- Use
/create-capability-agentto scaffold a new A2A agent with its own container, Dockerfile, and CDK stack. - Switch to Amazon SageMaker mode -- For production deployments where audio must stay within the Amazon VPC, deploy with
/deploy-sagemakerto use self-hosted STT/TTS on GPU instances. - Configure call transfers -- Set the
TRANSFER_DESTINATIONenvironment variable to enable SIP REFER transfers to human agents. See Call Transfers. - Tune auto-scaling -- Adjust
targetSessionsPerTask,sessionCapacityPerTask,minCapacity, andmaxCapacityvia CDK context for your call volume. - Upload knowledge base documents -- Add your own FAQ and policy documents to the S3 bucket backing the Amazon Bedrock Knowledge Base for domain-specific RAG.
To remove all deployed resources and avoid ongoing charges:
Run /destroy-project in Claude Code. This will:
- Release the Daily.co phone number
- Remove pinless dial-in configuration
- Destroy all CDK stacks (capability agents first, then core infrastructure)
- Clean up local output files
-
Release the Daily.co phone number (if purchased):
# List phone numbers curl -s -H "Authorization: Bearer $DAILY_API_KEY" https://api.daily.co/v1/phone-numbers # Release a specific number curl -X DELETE -H "Authorization: Bearer $DAILY_API_KEY" https://api.daily.co/v1/phone-numbers/PHONE_NUMBER_ID
-
Destroy capability agent stacks (if deployed):
cd infrastructure npx cdk destroy VoiceAgentCrmAgent VoiceAgentKbAgent --force -
Destroy core infrastructure stacks:
npx cdk destroy --all --force
-
Verify cleanup -- Confirm no resources remain:
aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE UPDATE_COMPLETE --query "StackSummaries[?starts_with(StackName, 'VoiceAgent')].StackName"
Note: Amazon SageMaker endpoints (if deployed) are the most expensive resources. Ensure the Amazon SageMaker stack is fully destroyed to avoid GPU instance charges.
| Mode | STT/TTS | Best For |
|---|---|---|
Cloud API (USE_CLOUD_APIS=true) |
Deepgram + Cartesia cloud APIs | Getting started, development |
| Amazon SageMaker (default) | Self-hosted on GPU instances | Production, data residency |
Cloud API mode requires Deepgram and Cartesia API keys. Amazon SageMaker mode requires Deepgram Marketplace subscriptions and GPU quota.
No response from agent:
- Check AWS Lambda logs for webhook errors
- Verify API keys are correctly configured in AWS Secrets Manager
- Check Amazon ECS service is running and healthy
High latency:
- Check Amazon SageMaker endpoint Amazon CloudWatch metrics
- Review Amazon CloudWatch metrics for Amazon Bedrock latency
- Verify VPC endpoints are configured correctly
No audio output:
- Verify Daily room configuration (SIP enabled)
- Check TTS provider API key is valid
- Review voice agent container logs
- This Guidance creates a NAT Gateway which incurs hourly charges even when idle.
- Amazon SageMaker endpoints (Amazon SageMaker mode) run on GPU instances that are billed per hour irrespective of usage.
- Third-party services (Daily.co, Deepgram, Cartesia) have their own pricing and usage limits.
- The Amazon ECS Fargate service runs at least one task continuously (always-on architecture) to avoid cold start latency.
- Cloud API mode routes audio through the public internet. Use Amazon SageMaker mode if data residency is required.
sample-voice-agent/
├── infrastructure/ # CDK infrastructure code
│ ├── src/
│ │ ├── stacks/ # CloudFormation stacks
│ │ ├── constructs/ # Reusable CDK constructs
│ │ └── functions/ # AWS Lambda function code
│ ├── scripts/ # Deployment & setup scripts
│ └── test/ # Infrastructure tests
├── backend/
│ ├── voice-agent/ # Voice pipeline container (hub)
│ │ ├── app/
│ │ │ ├── services/ # STT/TTS/LLM service factories
│ │ │ ├── tools/ # Tool framework + built-in tools
│ │ │ ├── a2a/ # A2A capability agent integration
│ │ │ ├── pipeline_ecs.py # Pipecat pipeline configuration (Daily transport)
│ │ │ ├── pipeline_local.py # Pipecat pipeline configuration (SmallWebRTC transport)
│ │ │ ├── local_main.py # Local prototyping entry point (FastAPI + browser WebRTC)
│ │ │ ├── observability.py # Metrics observers
│ │ │ └── service_main.py # HTTP service (aiohttp)
│ │ ├── tests/ # Python tests
│ │ └── Dockerfile # Container definition (Python 3.12)
│ └── agents/ # A2A capability agents (spokes)
│ ├── knowledge-base-agent/ # KB RAG agent
│ └── crm-agent/ # CRM agent (5 tools)
├── docs/
│ ├── guides/ # Developer guides
│ ├── patterns/ # Architecture patterns
│ └── reference/ # Reference documentation
└── resources/ # Sample data (KB documents)
| Guide | Description |
|---|---|
| Deployment Guide | Full infrastructure deployment walkthrough (cloud API + Amazon SageMaker) |
| Daily.co Setup | Daily.co phone number and webhook configuration |
| Deepgram Marketplace Setup | Subscribe to Deepgram model packages for Amazon SageMaker mode |
| Call Transfers | Optional SIP REFER transfer to human agents |
| Adding a Capability Agent | Build and deploy a new A2A capability agent |
| Adding a Local Tool | Add tools to the voice agent pipeline |
| Capability Agent Pattern | Architecture reference: hub-and-spoke pattern, latency optimization |
- Maximum concurrent calls per container is configurable (default: 10) but bounded by CPU/memory.
- Cold start for new Amazon ECS tasks takes ~90 seconds. Total time from overload to new capacity: ~3-5 minutes.
- The A2A capability agent discovery relies on AWS Cloud Map polling (default: every 30 seconds).
For any feedback, questions, or suggestions, please use the issues tab under this repo.
| Date | Description |
|---|---|
| March 2026 | Initial release -- Cloud API and Amazon SageMaker deployment modes, A2A capability agents, auto-scaling |
Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided "as is" without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
