Frequently Asked Questions

This page contains all frequently asked questions and their answers.

How do credits work in Cekura, including pricing for testing, monitoring, and overages?

Understanding Credits and Pricing in Cekura

Cekura uses a credit-based system to manage usage across testing, monitoring (observability), and evaluation. This allows you to pay only for what you use across different stages of your agent’s lifecycle.

1. Credit Consumption Rates

Credits are consumed based on the specific activity performed within the platform:

Voice Testing: 5 credits per minute of voice run.
Chat-Based Testing: 0.5 credits per message sent by the testing agent.
Monitoring & Observability (Evaluation): 0.2 credits per metric run to evaluate a conversation.

2. Usage Examples

Example A: Testing a Voice Agent If you run a 2-minute voice test and evaluate it with 5 metrics, the total cost is 11 credits:

Voice Duration: 2 minutes × 5 credits = 10 credits
Evaluation: 5 metrics × 0.2 credits = 1 credit

Example B: Monitoring/Observability (Call Evaluation Only) If you are importing external calls (e.g., from Retell or Twilio) for quality assurance and run 10 metrics to evaluate the performance of that call, the total cost is 2 credits:

Evaluation: 10 metrics × 0.2 credits = 2 credits

3. Overages and Manual Top-ups

To ensure your testing and monitoring are never interrupted, Cekura provides flexible credit management options:

Overages: When enabled, overages allow the platform to continue processing calls and metrics even after your base credit limit is reached. This is critical for production monitoring to ensure you don’t lose data during high-volume periods. Overages are billed at 2× your committed credit rate for your tier. On annual contracts, unused credits roll over to the following month; an overage occurs when usage in a month exceeds your monthly credit allocation plus any rolled-over credits from prior months. Overages are invoiced monthly; if a card is on file, it can be charged directly.
Manual Top-ups: If you are in a sandbox or trial phase and run out of credits, our support team can manually add credits to your account to ensure your development work continues without delay.

4. BestPractices for Credit Optimization

To manage your credit burn effectively, consider the following strategies:

Smart Sampling: For high-volume monitoring, configure your metrics to run only on a percentage of calls or specific call types (e.g., excluding calls that go to voicemail) to save on evaluation credits.
Bulk Re-evaluation: Use the re-evaluate feature carefully. Note that re-running metrics on existing calls will consume credits at the standard rate of 0.2 credits per metric. This is often used when you have updated a metric prompt and want to see how it performs on historical data.
Metric Refinement: Periodically review your evaluators. Disabling metrics that are no longer providing actionable insights will reduce your per-call credit cost.
Use the Optimizer: For project-level metrics, use the Cekura Optimizer to ensure your prompts are efficient, which helps maintain high accuracy without unnecessary manual re-runs.

If you need to request a credit top-up, enable overages, or discuss a custom plan, please reach out to us at support@cekura.ai or via your dedicated Slack support channel.

Can I see logs of the webhooks Cekura sends to my endpoint?

Cekura does not currently store or display webhook delivery logs in the dashboard UI. You cannot view a history of outgoing webhook events, their delivery status, or your endpoint’s responses directly from the dashboard. What is available:

Enterprise contracts: Customers on enterprise plans have access to audit logs that can cover webhook activity. Contact your account team for details on what is included.
Per-call investigation: For a specific call or event, the Cekura support team can search internal traces to check webhook delivery. Reach out via your support channel with the specific call details.

For information on the structure and payload format of the webhooks Cekura sends, see the Webhook Format reference page.

Where can I upload knowledge base files on the Cekura dashboard and what is the recommended format for an FAQ-based document?

To provide your AI agent with the necessary context and reduce hallucinations, you can upload knowledge base files directly through the Cekura dashboard.

How to Upload Knowledge Base Files

Log in to your Cekura dashboard.
Navigate to Agent Settings for the specific agent you are configuring.
Look for the Knowledge Base section, typically located on the bottom right of the settings page.
Click to upload your file.

If I add a custom metric to a specific agent, will it automatically apply to all agents or can it be applied to a single agent only?

In Cekura, you can choose whether a metric applies to a single agent or across your entire project. By default, metrics are created at the Agent-level, meaning they only apply to the specific agent they were created for.

Agent vs. Project Metrics

Agent-level Metrics: These are specific to one agent and are ideal for unique evaluation criteria tailored to a specific persona or use case.
Project-level Metrics: These apply to all agents within a project, allowing for consistent evaluation across your entire workspace.

How to Manage and Create Metrics

Navigate to Metrics: Select the Metrics section from the left-hand navigation panel.
Switch Views: Use the tabs at the top of the page to switch between Agent Metrics and Project Metrics.
Create a Metric: The creation button will change based on your active tab (e.g., “Create Agent Metric” or “Create Project Metric”).
Convert an Agent Metric to Project-level: If you have an existing agent metric that you want to apply to all agents, click the three dots (…) next to the metric and select Move to Project Metric.

For more information on configuring these settings, please refer to our Basic and Advanced Metrics Guide.

What is the best way to perform A/B testing to compare two different conversational agents?

To perform effective A/B testing between two conversational agents (such as a production agent and a new version with a modified conversational flow) in Cekura, we recommend the following workflow:

Generate Test Cases (Evaluators): Start by generating a set of high-quality test cases. We recommend generating at least 10 good test cases that cover your agent’s primary goals and edge cases. You can use Cekura’s automated generation tools to create these based on your agent’s context.
Establish a Baseline (Old Agent): Run these test cases against your existing ‘old’ agent first. This serves two purposes: it establishes a performance baseline and allows you to verify that the generated test cases are accurate and produce the expected results for your current flow.
Run Comparative Tests (New Agent): Execute the exact same set of test cases against your ‘new’ agent version.
Analyze and Compare: Once both runs are complete, navigate to the Results section in the Cekura dashboard. Select the two different test results you wish to evaluate, and then press the Compare button. This will provide a side-by-side comparison of how each agent performed against your defined metrics.

By running identicaltest cases against both agents, you can directly measure improvements or regressions in performance, accuracy, and flow adherence. For more information on setting up your agents for testing, refer to the Agent Setup Guide. To learn more about how to interpret the results, see our documentation on Basic and Advanced Metrics. Yes, Cekura supports several options for exporting call data and sharing reports.

1. CSV Export of Selected Calls

On the Calls (Observability) page, you can select individual call records and export them as CSV. This is useful for sharing specific calls with stakeholders or running offline analysis on a targeted subset.

2. Dashboards with Daily Reports

Create a dashboard using the metrics and breakdowns you need — call volumes, outcome distributions, metric scores, and more. Use the ⋮ menu on any dashboard to include it in a daily report email sent to project members. Daily reports can be routed to a specific address by configuring project membership. See Configuring daily report recipients.

3. Using the Cekura API

For custom or automated reporting, use the Get Call API to programmatically fetch call data and metric evaluations and format them to match your reporting requirements.

4. Custom Reporting Support

If you need a specific breakdown not covered by the options above, reach out to the Cekura support team. We can provide specific endpoints tailored to your organisation’s reporting requirements.

Is there a way to update or add metadata to data after it has been posted to the observability endpoint?

Currently, it is not possible to modify or add metadata to a call record once it has been posted to the Cekura observability endpoint. Data sent to this endpoint is treated as immutable to maintain the integrity of the evaluation results and historical reporting.

Best Practices for Metadata

To ensure your analytics and metrics are as comprehensive as possible, we recommend the following:

Collect All Data Upfront: Ensure that all relevant identifiers, such as CRM IDs, user segments, or session variables, are gathered before sending the payload to Cekura.
Include Metadata in the Initial POST: All metadata should be included within the metadata object of your initial request to the observability API.
Verify Data Before Sending: Double-check that the metadata values are correct, as they will be used for filtering and calculating advanced metrics immediately after ingestion.

For more information on how to structure your metadata for optimal use in metrics, please refer to our documentation: https://docs.cekura.ai/documentation/key-concepts/metrics/basic-advanced-metrics#2-metadata

When making an inbound call to the system for testing, do I need to manually trigger the conversation?

The process for triggering a conversation depends on whether your agent is designed to receive calls (Inbound) or make calls (Outbound).

1. If your agent receives calls (Inbound Agent)

If you have an agent that receives calls, you do not need to manually dial. Simply provide the phone number associated with your agent in the Agent Settings. You can also override this number when you start a test run. Cekura will then initiate the call to your agent automatically.

2. If your agent makes calls (Outbound Agent)

If your agent is designed to make calls to people (which Cekura receives as inbound calls), the workflow is as follows:

Run Evaluators: Start the test run in Cekura first.
Get Phone Numbers: Cekura will provide you with a specific list of phone numbers to call.
Initiate Calls: You will need to make calls from your system to these provided numbers. You can leverage our APIs to automate this process.

3. Automatic Triggering via Integrations

If you use Retell, Vapi, or 11labs for your agent, Cekura can automatically trigger calls from your system to our numbers without manual intervention. To enable this:

Go to Agent Settings.
Select Voice Integration.
Choose your provider (e.g., Retell, Vapi, etc.).
Fill in the required details and enable the Outbound Auto Call flag.

For more detailed information on testing outbound agents, please refer to our documentation: Testing Outbound Calls.

How can I request a copy of your SOC 2 report?

To request a copy of our SOC 2 report, please reach out to us via support@cekura.ai or through your dedicated account manager. While the trust site does not explicitly mention the access process, you can find other relevant security and compliance information on our Trust Site here: https://tatva-labs-inc.trust.site/

How can I obtain a Business Associate Agreement (BAA) for HIPAA compliance?

Cekura supports HIPAA compliance for organizations that process Protected Health Information (PHI) during the testing and monitoring of AI voice and chat agents. To obtain a Business Associate Agreement (BAA), please follow these steps: 1. Reach out to your Cekura Account Manager or contact our support team. 2. Our legal department will provide a standard BAA for your review and signature. 3. Once executed, your organization can safely use Cekura to evaluate call recordings, transcripts, and metadata containing PHI. For example, healthcare providers can monitor patient-agent interactions to ensure accuracy and compliance without compromising data security.

What legal documents does Cekura provide for enterprise due diligence or legal review?

For most enterprise due diligence and legal review purposes, Cekura’s published Terms of Service available on the Cekura website is the standard document provided for customer review. If your enterprise contract or BAA references an “Enterprise Service Agreement,” that term refers to the published Terms of Service — no separate document exists by that name. If your legal team requires additional compliance agreements — such as a Business Associate Agreement (BAA) for HIPAA compliance, or a signed agreement tailored to your account — please reach out to your Cekura Account Manager or contact us at support@cekura.ai.

How can I change the inbound number used to make calls to the main agent?

To change the inbound number used to make calls to your main agent, you need to update the configuration within the specific evaluator. Follow these steps:

Edit Evaluator: Navigate to the specific evaluator you wish to modify within your scenario.
Go to Configuration: Locate and click on the Configuration section.
Update Phone Number: Find the phone number field and select your preferred number from the list of available phone numbers.

Note: If you want to use your own phone numbers (Bring Your Own Number), you can find instructions on how to integrate with Twilio here: https://docs.cekura.ai/documentation/key-concepts/phone-numbers/twilio

How can I integrate Cekura Cronjob success and failure notifications directly into a Datadog dashboard using webhooks?

To monitor the health of your Cekura Cronjobs within Datadog, you can leverage Cekura’s webhook notification system. While Datadog does not currently provide a direct, generic webhook endpoint that can ingest Cekura’s specific payload format without transformation, the integration can be achieved using an intermediary service.

Recommended Integration Workflow

Understand the Cekura Payload: Cekura sends detailed notifications regarding the success or failure of scheduled test runs. You can review the exact JSON structure of these notifications in the Cekura Webhook Documentation.
Create an Intermediary Webhook: Since Datadog requires data to be formatted for its specific Events or Logs API, you should set up a simple intermediary service (such as an AWS Lambda function, Google Cloud Function, or a lightweight server). This service will:
- Receive the POST request from Cekura.
- Extract relevant fields (e.g., job status, agent name, or failure reason).
- Transform and forward this data to the Datadog Events API.
Configure Cekura Notifications:
- In the Cekura platform, navigate to the settings for your specific Cronjob.
- Provide the URL of your intermediary service in the webhook configuration field.
Build the Datadog Dashboard: Once the events are flowing into Datadog, you can use the Datadog Dashboard builder to create visualizations. You can filter by the event source or custom tags you’ve sent from your intermediary script to track success/failure rates over time.

By using this approach, you ensure that your monitoring team receives real-time alerts and visual data regarding your AI agent testing schedules directly within your existing Datadog environment.

How do I choose a subscription package for load testing, and how can I perform load testing for inbound and outbound calls?

Cekura is built to handle high-scale performance testing, supporting north of 2000 concurrent calls.

Subscription Packages and Concurrency

Our subscription plans include specific concurrency limits. Higher concurrency requires higher committed resources from our infrastructure to ensure your tests run smoothly and accurately.

Upgrading Concurrency: If the concurrency level you need to test exceeds your current package limits, please reach out to us at support@cekura.ai or contact your dedicated Account Manager to discuss an upgrade.

How to Perform Load Testing (Inboundand Outbound)

Whether you are testing inbound or outbound agents, the process involves simulating high traffic using your defined evaluators (test cases). Follow these steps to set up your load test:

Select Evaluators: Pick one or more evaluators (test cases) that you want to run as part of the load test.
Set the Frequency: To reach your desired concurrency level, you must distribute the load across your selected evaluators. Use the following formula:
- Frequency = Total Target Concurrency / Number of Evaluators Selected
Execute the Test: Once configured, Cekura will simulate the calls—either by dialing out to your agent (outbound) or acting as the caller for your agent (inbound)—to stress-test the infrastructure and agent logic.
Analyze Results: After the run, Cekura evaluates the performance against your defined metrics and expected outcomes to identify any infrastructure issues or false flagging.

For a comprehensive step-by-step walkthrough, please visit our Load Testing Documentation.

Can I define custom tools or scripts to perform setup and cleanup tasks before and after a Cekura test for end-to-end integration testing?

Yes, you can perform setup and cleanup tasks by integrating Cekura into your automated end-to-end testing workflows. This allows you to verify that your AI agent integrates correctly with your entire system architecture.

Recommended Workflow for Integration Testing

To achieve a full end-to-endtest cycle, you can wrap the Cekura testing process within your own automation scripts or CI/CD pipeline:

Setup (Prep Work): Run your custom scripts to prepare the environment. This might include database seeding, provisioning test users, or initializing specific system states.
Initiate Cekura Test: Programmatically trigger a Cekura test run using our APIs. Cekura will use the agent context to generate test cases (evaluators), execute them, and evaluate the outcomes based on your defined metrics.
Cleanup: Once the Cekura evaluation is complete, run your teardown scripts to reset the environment or clear test data.

By using this sequence, you can ensure that your agent performs as expected within the context of your broader system. For more information on programmatically interacting with the platform to trigger these runs, please refer to our API documentation at https://docs.cekura.ai/api-reference/.

Why are successful parallel outbound calls showing as timeouts in the dashboard?

Timeouts in the Cekura dashboard during outbound testing usually occur when the platform cannot successfully link an incoming call to an active test evaluator run. This is particularly common during parallel testing if the runs are not initialized correctly.

Common Causes for Timeouts

Unmatched Evaluator Runs: Each individual call must correspond to a specific triggered evaluator run. For example, if you make three parallel calls but only trigger the “Run Evaluator” action once in the dashboard, Cekura will only record one run. The other calls will not be matched to a session, leading to timeouts for the expected test cases.
The Outbound Timeout Window: By default, once you trigger an evaluator (via UI or API), you must initiate the call within 5 minutes. If the call is received after this window, the session expires and results in a timeout.
- Pro Tip: This window is configurable up to a maximum of 5 minutes via Settings -> Project -> General -> Outbound Timeout.
Caller ID Mismatch: Cekura identifies calls based on the phone number specified in your Agent Settings. If the incoming caller ID does not match the expected number for that specific test, the system cannot link the call to the evaluator.
Mode Configuration: When running outbound tests, ensure you have selected the correct mode (same_number vs different_numbers) for handling multiple concurrent calls.

Best Practices for Parallel Testing

To ensure all parallel calls are tracked accurately and to avoid manual errors:

Use the API: Instead of manually copy-pasting numbers from the UI, use the Run Scenarios API. This allows you to programmatically trigger multiple runs and receive the specific destination numbers for each call in real-time.
One-to-One Mapping: Ensure that for every call your system initiates, a unique request is sent to the Cekura “Run Evaluator” endpoint first. This creates the necessary placeholder in our system to receive and evaluate the call data.

If you continue to see timeouts despite these configurations, please verify that your SIP routing is correctly passing the caller ID to Cekura.

How can I configure a custom domain and logo for my reports, and what should I do if I encounter an error during the domain setup?

To white-label your reports and shareable links in Cekura, you can configure a custom domain and logo through the platform settings. Please note that custom domain configuration is an Enterprise-level feature and is not available on the Startup plan.

How to Configure Custom Branding

1.Navigate to Settings in your Cekura dashboard. 2. Go to the Domains section. 3. Enter your domain details and upload your company logo.

Troubleshooting and Plan Restrictions

If you encounter an error while setting up your domain, please check the following:

Plan Eligibility: Custom domains and logo configurations are restricted to Enterprise accounts and are no longer supported on the Startup plan. If you are on a Startup plan, the fields may appear in the UI, but the configuration will not be saved.
Manual Workaround: If you need to provide a branded PDF report immediately and cannot update the logo via the dashboard, you can manually edit the Cekura logo from the generated PDF using a standard PDF editor.

For further assistance or to upgrade your plan, please visit the Cekura Documentation.

How can I perform end-to-end testing of a chatbot’s LLM flow via SMS to ensure all conversation types are handled correctly?

Cekura provides native support for testing SMS-based chatbot flows, allowing you to validate the end-to-end logic of your AI agents. You can test these flows by either using Cekura’s native SMS integration or by connecting your custom backend via an API or WebSocket.

How to Set Up SMS Testing

To begin testing your SMS agent, follow these steps in the Cekura platform:

Configure the Agent: Navigate to Agent Settings > Chatbot.
Select Provider: Choose SMS as the provider. If this option is not yet visible in your UI, please contact Cekura support to have it enabled for your account.
Define Evaluators: Create new test cases (evaluators) or reuse existing ones that define the expected behavior and metrics for your conversation.
Run the Test: In the testing interface, set the chat mode to SMS and click Run. Cekura will simulate the conversation and evaluate the performance based on your defined metrics.

Integration Options for Custom Backends

If you are using a custom backend or want to test the AI logic without relying on a carrier like Twilio, you have two primary options:

API Endpoint: Expose an API endpoint that accepts a user’s message and returns the agent’s response. This allows Cekura to interact with your system as if it were sending and receiving SMS messages.
WebSocket Integration: Cekura supports chat-based testing via WebSockets. You can use a sample script to convert your local logic into a WebSocket endpoint and route it through a toollike ngrok to provide Cekura with a public URL. You can find a reference implementation here: LLM WebSocket Server Example.

Key Benefits

E2E Validation: Test the entire LLM logic from the initial user text to the final agent response.
Automated Evaluators: Automatically generate test cases based on your agent’s context to ensure comprehensive coverage of conversation types.
Seamless Transition: Easily switch between voice and SMS testing if your system supports multi-channel dynamic transitions.

For more details on setting upyour agent and defining metrics, please refer to the Cekura Agent Setup Guide.

How can I correlate incoming calls with the specific evaluation runs that triggered them in the Cekura API?

Correlating incoming calls to specific test runs is essential for organizing your evaluation data and ensuring each call is matched to the correct test case. Cekura provides two primary methods to achieve this: randomized phone numbers and custom integrations.

1. Randomized Phone Numbers (Immediate Correlation)

The simplest way to distinguish between concurrent calls is to enable the randomize phone number flag. By setting the mode to use different numbers in your request, each parallel test case will originate from a different caller ID. When calling the Run Scenarios endpoint, ensure your configuration is set to randomize the outbound numbers. This allows your system to match the incoming call’s phone number to the specific run ID immediately upon receipt.

2. Custom Integration (Deterministic Correlation)

For a more robust and scalable solution, you can implement a custom integration. This allows you to link your internal system’s unique identifiers (like a CallSid) with Cekura’s evaluation runs.

The Process: Once a call is complete on your end, you send Cekura the transcript and your local call ID (which Cekura labels as the provider_call_id).
Automated Matching Logic: Cekura uses a deterministic matching algorithm to correlate your calls to the evaluation runs based on several factors: phone numbers, call durations, start times, and transcript fuzzy matching.
Retrieval: You can then fetch specific runs using the List Runs endpoint and filtering by the provider_call_id to find the corresponding Cekura run ID.

Note for Retell and Vapi Users

If you are using a supported provider like Retellor Vapi, you do not need to manually send the correlation data. If the integration is correctly configured in your Cekura dashboard, the provider_call_id will be automatically populated in your run results, allowing for seamless tracking without additional code.

What does the response consistency metric test for?

The Response Consistency metric is a pre-defined evaluation tool in Cekura designed to ensure your AI agent remains reliable and stable throughout a conversation. It primarily focuses on two key areas:

Information Stability: This checks if the agent provides the same answer when asked the same question multiple times within a single session. For example, if a user asks for the price of a service three times, the agent should provide the exact same pricing information each time.
Data Persistence and Accuracy: This verifies that the agent correctly remembers and utilizes information provided by the user. For instance, if a user states their phone number is “123-456-7890”, the metric checks if the agent uses that same number when repeating it back or referencing it later in the interaction.

You can easily add this to your test cases or monitoring setup by selecting it from the list of pre-defined metrics. This is a critical check for building trust, as inconsistencies in pricing or data handling can lead to a poor user experience. For more information on how to use metrics to evaluate your agent’s performance, please visit our documentation: https://docs.cekura.ai/documentation/key-concepts/metrics/basic-advanced-metrics

How do I configure a Retell chat agent in Cekura and ensure that the chat tests appear on the dashboard?

To configure a Retell chat agent and ensure your test results are visible on the Cekura dashboard, follow these steps:

1. Prepare your Retell Agent

First, ensure you have a chat-compatible version of your agent. In your Retell dashboard, you can Copy as chat agent any of your existing voice agents to enable text-based interactions.

2. Connect Retell to Cekura

Next, link your agent to the Cekura platform:

Navigate to Agent Settings on your Cekura dashboard.
Go to the Chatbot Control tab.
Select Retell as the provider.
Enter the required connection details for your Retellagent.

3. Run and Monitor Tests

To execute your tests and ensure they are correctly logged to your dashboard:

Navigate to your Evaluators.
When initiating a test run, select the Run as Text option. This ensures the system treats the interaction as a chat session rather than a voice call.

Once the test is complete, the results will automatically populate your dashboard for analysis. For more specific details on this setup, please refer to the Retell Integration Guide.

What is the best way to evaluate a scenario where a caregiver or family member picks up instead of the patient, and should I adjust agent settings or generate evals for this?

To evaluate scenarios where a caregiver or family member answers the call instead of the patient, the best approach is to generate specific evaluators (test cases) using the “Extra Instructions” feature rather than tweaking your agent’s core settings. This allows you to test the agent’s robustness against different personas without altering its underlying logic.

Step-by-Step Instructions

Navigate to Evaluator Generation: Within the Cekura platform, go to the section where you generate your test cases.
Apply Extra Instructions: Use the “Extra Instructions” field to provide specificcontext for the AI persona.
Use a Specific Prompt: Enter a prompt similar to the following:
“Generate a scenario where the caregiver or family member picks up instead of the patient. The caregiver should go through the flow on the patient’s behalf.”
Run the Test: Execute the generated test cases to observe how your agent handles the interaction with a non-patient party.

Why this approach?

By generating evaluators with specific instructions, you can simulate real-world variability. This method tests if your agent can maintain the conversation flow and meet its objectives even when the primary contact is unavailable, which is a critical metric for healthcare-related AI agents.

Will incoming calls be restricted to a specific phone number, or can I still receive calls from multiple different numbers?

Cekura does not restrict incoming calls to a specific phone number; the platform is designed to be fully compatible with all numbers. Each evaluator (test case) has a phone number field, you can select from the pool of numbers available or bring your own phone numbers. You can continue to receive and test calls from multiple different sources, which is particularly useful for simulating diverse real-world scenarios for your AI voice agents. This flexibility ensures that your testing and monitoring workflows are not limited to a single originating caller ID. For example, when testing an agent, you can run test cases that originate from various numbers to ensure the agent’s logic and metrics hold up across different caller contexts. Similarly, for monitoring, Cekura allows you to send call recordings or transcripts from any number to be evaluated.

Does Cekura support a webhook for receiving call observability metrics and evaluation results to store in an external database for analytics?

Yes, Cekura supports webhooks that allow you to receive real-time results of call observability metrics and evaluations. This feature is specifically designed to help teams integrate Cekura’s insights into their own infrastructure, such as external databases or custom analytics dashboards.

How it Works

When Cekura finishes evaluating a call based on your defined metrics, it can automatically trigger a webhook. This sends a POST request containing the evaluation results and associated metadata to a destination URL of your choice.

Key Benefits

Data Persistence: Store your evaluation results inyour own database for long-term record-keeping.
Custom Analytics: Use your preferred BI tools (like Tableau, Looker, or Grafana) to visualize Cekura’s metrics alongside other business data.
Automation: Trigger downstream workflows in your system based on specific evaluation outcomes.

Documentation

You can find the technical specifications, including the payload structure and setup instructions, in our documentation here: Webhook Format.

Which model does Cekura use to calculate the metric results?

Cekura primarily utilizes Gemini models to calculate metric results and perform evaluations across both its testing and monitoring workflows. These models are used to analyze agent interactions—whether they are generated during automated testing or captured from live call recordings and transcripts—against the specific metrics and expected outcomes defined in your setup. By leveraging Gemini’s advanced reasoning capabilities, Cekura ensures high-quality and consistent evaluation of your AI voice and chat agents. For more information on how metrics are structured and used within the platform, you can refer to the documentation on Basic and Advanced Metrics.

How can I link an ElevenLabs account to view conversation IDs and tool call timestamps for evaluator test calls?

To link your ElevenLabs account and track granular conversation data such as Conversation IDs and tool call timestamps, follow these steps:

Navigate to Agent Settings: In the Cekura dashboard, go to the settings for the specific agent you are testing.
Select ElevenLabs: Under the provider or voice settings, select ElevenLabs.
Provide API Access: Ensure you have provided the necessary API key permissions to allow Cekura to fetch conversation data.

Why link your ElevenLabs account?

Conversation ID Tracking: Cekura will automatically match your evaluator test calls to the specific ElevenLabs Conversation ID.
Tool Call Visibility: You will be able to see exactly when tool calls were performed during the interaction, making it easier to debug the agent’s logic and response times.

How should I format transcript data for Cekura’s observability API if my source only provides a single timestamp and not an end time?

Before manually modifying your data, first check if your current transcript format is natively supported by Cekura here: https://docs.cekura.ai/documentation/advanced/transcript-format. If your format is supported, you can send it as-is without manual transformation. If your format is not supportedand you only have a single timestamp available (as is common with some platforms like Trillet AI), you should follow these steps to format it for Cekura’s Observability API:

Recommended Workaround

Map the single timestamp provided by your source to both the start and end time fields in the Cekura transcript object.

Use the provided timestamp as the start_time.
Set the end_time to be identical to the start_time.

Important Note on Latency

By setting the start and end times to the same value, Cekura will not be able to provide accurate latency or duration metrics for those messages. While other evaluations (such as sentiment, accuracy, and goal completion) will work perfectly, the latency will be recorded as zero.

Example Transformation

Source Data:

{
  "timestamp": "2025-09-02T10:36:59.505Z",
  "role": "agent",
  "content": "One moment while I send you the link"
}

Cekura Format:

{
  "role": "agent",
  "content": "One moment while I send you the link",
  "start_time": "2025-09-02T10:36:59.505Z",
  "end_time": "2025-09-02T10:36:59.505Z"
}

How can I add Time to First Audio (TTFA) as an infrastructure metric for a Pipecat and Twilio setup?

In Cekura, you can track Time to First Audio (TTFA) and other infrastructure-related latencies by leveraging the platform’s built-in metrics and transcript data. Cekura provides visibility into these metrics to help you monitor performance across your Pipecat and Twilio integrations.

Accessing Latency Metrics

You can access detailed latency data for your calls through Cekura’s Python-based metric evaluation system. The latency_metrics object provides timing information for the conversation flow.

TTFA Calculation: If your agent is configured to speak first (e.g., a greeting), the TTFA is represented by the start_time (in seconds) of the first message from the main agent.
Reference: Cekura Latency Metrics Documentation

Handling Complex Scenarios with Transcript Data

In cases where the conversation flow is more complex—such as when a testing agent speaks first or you need to measure specific turn-around times—you should use the transcript_json directly.

Granular Timing: The transcript_json contains a detailed breakdown of every message, including precise start and end timestamps for each turn. This allows you to programmatically calculate TTFA or any other custom latency metric based on the specific sequence of events in the call.
Reference: Cekura Transcript JSON Documentation

Implementation Workflow

To trackthese metrics for your Pipecat + Twilio agent:

Send Call Data: Ensure your call recordings, transcripts, and metadata are sent to Cekura via the observability APIs.
Define Metrics: Use the Python code metrics to extract the start_time of the first agent message from the latency data.
Monitor: View the resulting TTFA plots in your Cekura dashboard to identify trends or spikes in infrastructure performance.

How do I set up the Cekura Slack integration to receive evaluation results and call alerts in a channel?

The Cekura Slack integration uses a single setup path that covers both simulation test result notifications and live production call evaluation alerts. Once configured, the Cekura bot posts to your chosen channel whenever a metric evaluation completes — whether from a simulation run or a live production call. To set up the integration:

Go to Settings in the Cekura platform.
Ensure you are in the relevant project.
Navigate to Integrations and click Configure next to Slack.
Follow the steps listed under ‘How to setup Slack notifications’ to connect your workspace.
Add the Cekura Bot: After the integration is set up, you must manually add the bot to your desired Slack channel. You can do this by typing /invite @Cekura in the channel or using the ‘Add apps’ option in Slack.
Enable Notifications: Make sure notifications are enabled in your project settings.

Once connected, your team can view evaluation results, click through to specific runs or call logs, and submit metric feedback directly from Slack.

Is screen recording or video/microphone use disabled in the Cekura dashboard?

No, Cekura does not disable screen recording, video, or microphone usage on its dashboard. Users are free to use third-party recording tools (such as Loom) or system-level recording features while navigating the platform. If you are experiencing issues with recording or media access, we recommend checking the following:

Browser Permissions: Ensure that your browser has granted the necessary permissions to your recording tool or the Cekura site.
System Privacy Settings: On macOS or Windows, verify that your recording application has permission to record the screen, microphone, or camera in the system settings.
Extension Conflicts: Sometimes other browser extensions can interfere with media capture. Try recording in an incognito window or disabling other extensions to troubleshoot.

If you continue to face difficulties, please contact our support team so we can investigate further. Cekura is fully GDPR compliant, ensuring that all data processing activities meet the rigorous standards set by the European Union for data protection and privacy. We provide a Data Processing Agreement (DPA) to our customers upon request to help satisfy legal and regulatory requirements. Please note that the DPA is provided upon request for a fee. To initiate this process or to receive further documentation regarding our compliance standards, please contact our support team.

What are Cekura’s EU data residency options and standard contract terms?

EU Data Residency

Cekura provides data residency options that depend on the type of data being processed:

Production monitoring data: EU data residency is currently live. If you are using Cekura to monitor live agent conversations (including recordings and transcripts), this data can be hosted within the EU region to meet local compliance requirements.
Synthetic testing data: Currently hosted in the US. A version supporting EU residency for synthetic testing data is in development.

For GDPR compliance documentation and Data Processing Agreements, see the section above.

Contract Terms

The standard minimum contract term is 12 months. A one-month break clause is available — if the product or support does not meet your needs within the first month, you have the option to exit the agreement.

Data Portability

To avoid vendor lock-in, Cekura provides full API access to retrieve your data at any time. See our API Reference for Getting Call Data for details on programmatically exporting call data and metadata. For custom contract arrangements or enterprise pricing, contact support@cekura.ai.

How can I schedule runs of evaluators in the platform?

Yes, you can schedule automated runs of your evaluators using the Cronjobs feature in Cekura. This allows you to perform regular testing and monitoring of your AI agents without manual intervention. To set up a schedule, follow these steps: 1. Select Evaluators: Go to your evaluators list and check the boxes next to the ones you want to schedule. 2. Open Actions: Click the Actions button found in the top right corner of the dashboard. 3. Configure Cron Job: Select Cron jobs from the menu. 4. Set Schedule: Choose your desired frequency and timing for the runs. Once configured, Cekura will automatically execute these test cases based on your defined schedule and evaluate them against your metrics.

How do I configure an agent for text-based runs, and does it require a different configuration than voice-based runs?

Yes, text-based runs require a different configuration than voice-based runs. While voice-based simulations primarily use a phone number to initiate a call, text-based simulations rely on digital interfaces and integrations to communicate with your agent.

Configuration Options for Text-Based Runs

Cekura provides several flexible methods to connect to your agent for text-based testing:

Direct Integrations: You can connect directly to popular AI voice and chat providers such as Retell, Vapi, and others.
SMS Testing: Cekura supports testing agents via SMS for mobile-based chat scenarios.
WebSocket Bridge: For custom or proprietary agents, you can configure a WebSocket URL in the agent settings. This URL acts as a bridge, allowing Cekura to send and receive text messages directly to your agent’s logic.

How to Set Up Your Agent

Navigate to Agent Settings: Open the specific agentyou want to test in the Cekura dashboard.
Select Connection Method: Choose the appropriate integration type (e.g., WebSocket, SMS, or a specific provider like Vapi).
Input Connection Details: If using a WebSocket, provide the specific WebSocket URL. If using a direct integration, ensure your API keysor provider-specific identifiers are correctly mapped.
Initiate the Run: When you go to Configure Run, select the Text Based option. Cekura will then use your configured digital interface instead of attempting to dial a phone number.

Cross-Platform Testing

A significant advantage of the Cekura platform is that you can run the same evaluators (test cases) across both text and voice channels. This ensures that your agent’s logic, knowledge, and guardrails remain consistent regardless of how the user interacts with it. For more detailed information on setting up these connections, please refer to the Chat-Based Testing Documentation and the Agent Setup Guide.

Do you have a testing integration with Telnyx?

Currently, Cekura does not have a native API integration with Telnyx for advanced features like tool call testing. However, you can still fully test your Telnyx-based voice agents by using their assigned phone numbers.

How it Works

Cekura is designed to interact with AI agents directly via phone calls. If your Telnyx agent has a phone number, Cekura can dial it to execute test cases and evaluate the agent’s performance.

Setup Steps

Agent Context: Define the context and purpose of your agent within the Cekura platform.
Phone Number: Provide the Telnyx phone number you wish to test.
Evaluators: Generate or define test cases (evaluators) based on your expected outcomes.
Run Tests: Cekura will call the number, simulate the conversation, and evaluate the results based on your defined metrics.

Important Considerations

Tool Call Testing: Because this method uses a standard phone connection rather than a deep API integration, testing specific backend “tool calls” (verifying that the agent triggered a specific internal function) is not currently supported for Telnyx.
Onboarding: If you are setting up an agent without a direct integration, you can follow the Onboarding Guide located in the bottom-left corner of the Cekura dashboard for a step-by-step walkthrough.

For more information on configuring your agent for testing, please refer to our Agent Setup Guide.

How can I set up a metric to track when the bot goes silent for more than a specific duration?

To accurately track instances where a bot goes silent, you should use the Infrastructure issues metric. It is important to avoid using an LLM-as-a-judge metric for this purpose, as LLMs evaluate transcript content and cannot reliably measure technical silence or specific time durations.

Configuration Steps

The Infrastructure issues metric is a statistical metric that is usually added as a default project metric in the Cekura platform. If you need to set it up or adjust it, follow these steps:

Navigate to Metrics: Go to the metrics configuration section of your project.
Add the Metric: If it is not already present, add the Infrastructure issues metric from the pre-defined metrics list.
Edit the Duration: Click the Edit button next to the Infrastructure issues metric.
Specify the Threshold: Enter the specific duration (e.g., 3 minutes) over which silence should be marked as a failure.
Save: Save your changes to apply the metric to your monitoring or testing workflow.

Choosing the Right Metric

Infrastructure issues: Use this metric when you specifically want to detect when the main agent is going silent for more than the configured time.
Silence detection: If you care about detecting silence from either the main agent or the testing agent/user, use the Silence detection metric instead.

Does Cekura provide an API to measure latency and other observability metrics for calls?

Yes, Cekura provides comprehensive API support for observability, allowing you to measure latency and other critical conversational metrics for your AI voice and chat agents.

How Observability Works in Cekura

To monitor your calls, you can send conversation details—such as call recordings, transcripts, and any relevant metadata—to Cekura via our APIs. Once the data is ingested, Cekura evaluates the calls based on the metrics you have defined.

Key Capabilities:

Latency Tracking: You can include latency data within the metadata sent to Cekura to monitor and analyze the responsiveness of your agents over time.
Conversational Metrics: Beyond latency, you can define and track a wide range of metrics to evaluate the quality and effectiveness of the interaction.
Data Retrieval: You can programmatically fetch call data and evaluation results using our API endpoints.

Relevant Resources:

Fetching Call Data: To retrieve specific call details, you can use the Get Call API.
Using Metadata: To understand how to leverage metadata (like latency) in your evaluations, see our Metadata Documentation.

What are the implementation requirements and costs for the ‘voice tone + clarity’ metric, and will an audio recording always be required for it to work?

The ‘voice tone + clarity’ metric is an audio-based evaluation tool designed to monitor the quality of AI voice agents. To use this metric, an audio recording is always required as the analysis is performed directly on the sound file. Implementation involves sending your call recordings along with anyrelevant metadata to Cekura for evaluation. The cost for using this metric is 0.2 credits per minute of audio recording processed. For more information on how to send call data via API, visit: https://docs.cekura.ai/api-reference/observability/get-call.

How can I access a list of available voice IDs and their configuration settings (such as gender, tone, and accent) to facilitate bias testing and personality overrides?

In Cekura, Personalities define the characteristics of the virtual caller, including their gender, accent, and tone. This feature is essential for bias testing, allowing you to observe if your AI agent behaves differently when interacting with various demographics.

How Personalities Work

By default, every test case (evaluator) you create has a specific personality attached to it. However, you do not need to manually edit each evaluator to test different voices. Instead, you can use the Personality Override feature during runtime to test the same scenario across multiple profiles.

OverridingPersonalities at Runtime

To facilitate bias testing without modifying your existing evaluators, follow these steps:

Select Evaluators: In the Cekura dashboard, select the test cases you wish to run.
Initiate Run: Click the Run button.
Additional Configuration: In the run setup menu, navigate to the Additional Configuration section.
Select Personalities: Choose the specific personalities (e.g., American Female, British Male, Indian Female, etc.) you want to include in this test run.

Execution Logic

When you override personalities, Cekura performs one run per scenario for every personality selected. For example, if you select 10 evaluators and 3 different personalities, the system will automatically execute 30 total test calls and compile the results into a single report. This allows you to efficiently identify if gender or accent variations impact the agent’s response logic or performance. For more information on managing these profiles, please refer to the Personalities Documentation.

Can parameters from user profiles be included in the expected outcome of a manually defined scenario?

Yes, Cekura supports the use of Test Profile information within the Expected Outcome of your scenarios. This allows you to create dynamic assertions that validate whether an agent is correctly identifying or verifying specific user data.

How to Use Profile Parameters

You can access data from an attached test profile by using the placeholder syntax. For example, if you want to ensure the agent correctly validates a date of birth, you can include {{test_profile.dob}} in your expected outcome description.

Example Use Case: Verification Testing

If you are testing a verification workflow where a user might provide incorrect details, your expected outcome could be structured like this:

Scenario: User provides a date of birth that does not match the profile.
Expected Outcome: “The agent should compare the user’s input against {{test_profile.dob}}. If the input is incorrect, the agent should ask for the DOB again. After two failed attempts, the agent must inform the user it cannot assist and direct them to call the facility directly.”

Recommended Workflow

Attach a Test Profile: Ensure the scenario is linked to a test profile containing the necessary metadata.
Define Logic: Use placeholders to reference profile fields so the evaluator knows exactly what value the agent should be checking for.
Automated Generation: You can also use the ‘Generate Scenarios’ feature with instructions like “Generate scenarios where the user provides fake verification information” while attaching a profile to automate this testing at scale.

For more technical details on accessing test profile data in your metrics and outcomes, please refer to the Basic and Advanced Metrics documentation.

Can I integrate and use my own custom ASR model within your simulation and observability environment for Saudi Arabic agents?

Yes, integrating custom ASR models for simulation and observability is supported. This feature is available exclusively on our Enterprise plans. Please reach out to our support team for more details on how to set this up.

How do I set up or activate a subscription?

To subscribe or re-activate a Cekura plan, visit dashboard.cekura.ai/select-plan to view available plans and complete signup. Alternatively, from within the dashboard navigate to Settings → Billing → Manage Billing. This opens the Stripe billing portal where you can add or update a payment method and choose or reactivate a plan. Note: the reactivation option in the Stripe portal requires a payment method to already be on file — if no option appears, use the dashboard.cekura.ai/select-plan URL directly. Subscription and credits: An active subscription is required to access the Cekura platform at all times. Pre-loaded credits cannot be used while the subscription is inactive. Credits do not expire — once you reactivate, any credits you have purchased will be available immediately. For questions about custom or Enterprise plans, contact support@cekura.ai.

Can I freeze or pause my subscription for a few weeks while my project is on hold?

Yes, if your project is on hold or you are between pilot phases, you can easily pause your subscription directly from the Cekura dashboard. To pause your subscription:

Log in to your Cekura account.
Navigate to Billing.
Click on Manage Plan.
You will be redirected to our secure Stripe billing portal.
Select the option to pause your subscription.

To reactivate your plan, return to the same Billing → Manage Plan section and follow the prompts in the Stripe portal. All your existing agent configurations, evaluators, and historical data will remain intact. If the reactivation option is not available in the Stripe portal (for example, no payment method is on file), go directly to dashboard.cekura.ai/select-plan to choose a plan. If you still cannot reactivate, contact support and the team can activate it for you directly.

How do I cancel my subscription?

You can cancel your subscription directly from the Cekura dashboard through the Stripe billing portal. To cancel your subscription:

Log in to your Cekura account.
Navigate to Billing.
Click on Manage Plan.
You will be redirected to our secure Stripe billing portal.
Select the option to cancel your subscription.

If the cancel option is not visible or you are unable to complete the cancellation, contact support and the team can assist directly.

How can I bulk import scenarios and what file formats are supported?

Cekura supports the bulk import of scenarios (also referred to as test cases or evaluators) to help you scale your testing and monitoring efforts efficiently.

Supported File Formats

You can provide your scenarios in various formats. The most commonly used and recommended formats are:

CSV: Ideal for simple, tabular scenario data.
JSON: Best for structured data or complex test cases.
Custom Formats: If your data is in a different format, the Cekura team can assist with internal conversion to ensure it is compatible with the platform.

How to Import Scenarios

API Import: You can programmatically import scenarios using Cekura’s APIs. This is the most efficient method for teams looking to automate their testing pipeline. You can explore our API Reference for more technical details.

How do I add an extra member to my workspace?

Adding team members to your Cekura workspace allows for collaborative testing and monitoring of your AI agents. To invite a new member, follow these steps:

Open Settings: Click on the Settings icon found in the top right corner of your dashboard.
Go to Members: Select the Members tab from the navigation menu within the Settings area.
Click Invite: Select the Invite button to open the invitation dialog.
Send Invitation: Enter the email address of the team member you wish to add and send the invite.

Note: Please ensure you have the necessary administrative privileges, as workspace management typically requires admin access. If you are unable to see the ‘Members’ tab or the ‘Invite’ button, you may need to ask an existing administrator to update your permissions or send the invitation for you.

Why might a user lose access to a Cekura workspace?

Users can lose access to a Cekura workspace when an organisation Admin removes their membership. This most commonly occurs when the organisation has reached its plan’s member limit — an Admin may clean up the member list to free up capacity for other users. If a team member unexpectedly loses access, an Admin can re-invite them at any time: go to Settings → Members → Invite and enter their email address. If you need to check your plan’s member limit or request an increase, reach out to support@cekura.ai.

How can I interface a custom-hosted voice agent using the OpenAI Realtime API with Cekura to run evaluations?

Cekura provides multiple ways to interface with your voice agent for testing and evaluation, depending on your hosting infrastructure and communication protocol. To run evaluations against a custom-hosted agent (such as one using the OpenAI Realtime API), you can use one of the following connection methods:

1. WebRTC (LiveKit & Pipecat)

If your agent is hosted using standard WebRTC protocols, Cekura offers native support for:

LiveKit: This is a common choice for OpenAI Realtime API implementations. You can follow our LiveKit Integration Guide to connect your hosted agent to Cekura.
Pipecat: Cekura also supports the Pipecat protocol for seamless audio streaming and evaluation.

2. Telephony and SIP

For agents that are accessible via traditional voice channels, Cekura supports:

Telephony: Testing via standard phone numbers.
SIP: Connecting directly to your VoIP infrastructure via Session Initiation Protocol.

3. Custom Implementations

If you use a proprietary WebRTC stack or a custom audio streaming method that does not fall into the categories above, Cekura can build a custom connector to support your specific architecture. This ensures that Cekura can correctly feed audio to and receive audio from your agent for high-fidelity testing.

The Testing Workflow

Once the connection is established, you can follow the standard Cekura workflow to evaluate your agent:

Define Context: Provide the agent’s purpose and connection type to Cekura.
Generate Evaluators: Cekura will automatically generate test cases (evaluators) or allow you to define your own metrics.
Run Tests: Cekura interacts with your agent through the chosen connection type (WebRTC, SIP, etc.).
Evaluate Results: The platform analyzes the conversation based on defined metrics and expected outcomes to provide a performance report.

How can I prevent the testing agent from ending the call?

The testing agent ends a call when its evaluator condition contains the <endcall /> tag in the action text. To prevent the testing agent from ending the call, remove the <endcall /> tag from the evaluator’s condition instructions. For example, change:

"Thank you, goodbye. <endcall />"

to:

"Thank you, goodbye."

Without the <endcall /> tag in any condition, the testing agent will stay on the line and let the main agent terminate the session. See the Conditional Actions guide for full tag reference.

A UI option to disable the end-call capability at the evaluator level was previously available but has been removed — it was redundant with simply omitting the tag from the evaluator instructions.

How do I set up outbound tests?

Setting up outbound tests in Cekura allows you to evaluate how your AI voice or chat agents perform when initiating contact, such as for appointment reminders or follow-up calls. To set up these tests, follow these steps: 1. Define Agent Context: Provide the background, purpose, and knowledge baseof your agent to help Cekura understand the expected behavior. 2. Generate Evaluators: Create test cases (evaluators) manually or use Cekura’s auto-generation feature to define success criteria and expected outcomes for the interaction. 3. Configure Outbound Parameters: Set up the specific triggers and destination details for the outbound reach-out. 4. Execute and Evaluate: Run the test cases and review the results based on defined metrics to ensure your agent is performing as expected. For a comprehensive step-by-step guide on configuring these tests, please refer to our official documentation: https://docs.cekura.ai/documentation/guides/testing-agents/outbound-evaluators

How can I limit the number of concurrent evaluator runs to manage parallel call capacity?

To manage parallel call capacity and avoid hitting concurrency limits from providers (such as VAPI), Cekura allows you to set a maximum limit for concurrent evaluator runs at the organization level. This ensures that when you trigger a batch of test cases, the system respects your infrastructure’s constraints.

How to configure the Parallel Call Limit:

Log in to your Cekura dashboard.
Navigate to Settings in the sidebar.
Under the Organisation section, select General.
Locate the field labeled Parallel Call Limit.
Enter the maximum number of concurrent calls or evaluators you want to allow (e.g., if your VAPI limit is 10, you might set this to 10).
Save your changes.

Once this is set, Cekura will automatically queue and throttle your evaluator runs to ensure they do not exceed the specified limit, preventing errors related to rate limiting or concurrency issues during your testing workflow.

How can I test LiveKit voice agents that do not have phone numbers attached using Cekura?

Yes, Cekura supports testing LiveKit voice agents directly via WebRTC, allowing you to run automated tests even if your agent does not have a phone number or PSTN endpoint attached. This is particularly useful for testing agents used in web or mobile applications.

How to Set Up LiveKit WebRTC Testing

To test your LiveKit agent, you will need to integrate your LiveKit environment with Cekura. Follow these steps:

Access Integration Settings: Navigate to the LiveKit integration guide in the Cekura documentation to understand the required parameters.
Provide Credentials: You will typicallyneed your LiveKit URL, API Key, and API Secret to allow Cekura to initiate WebRTC sessions.
Define Agent Context: Just like phone-based testing, provide the context and instructions for your agent so Cekura knows how to interact with it.
Generate Evaluators: Cekura will automatically generate test cases (evaluators) based on your agent’s context, or you can manually define specific scenarios you want to test.
Run and Evaluate: Execute the test cases. Cekura will connect to your LiveKit agent via WebRTC, conduct the conversation, and evaluate the performance based on your defined metrics.

Billing and Credits

Testing LiveKit agents via WebRTC utilizes credits in the same way as standard phone call testing. There is no difference in the billing rate between PSTN and WebRTC-based tests. For a detailed step-by-step walkthrough on setting this up, please refer to our official guide: LiveKit Integration Guide.

How can I configure daily reports to be sent to a specific email address instead of all project members?

To route daily reports to a specific email address (such as a shared inbox or a Google Group) rather than all project members, you need to ensure that the target email is registered as a member of your project. Follow these steps to configure your reporting preferences:

1. Invite the Target Email Address

Currently, Cekura sends reports to active members within the organization. You must invite the specific email address you want to receive reports to your Cekura account.

2. Create an Account for the Email

The invited email must have an active Cekura account to receive notifications.

If usinga shared inbox or Google Group: Since these often cannot use SSO (Single Sign-On), you should create the account using the Email/Password signup method.
Ensure the invitation is accepted and the user appears in your member list.

3. Disable Notifications for Other Members

Once the designated reporting email is added, you can restrict who receives the daily updates:

Navigate to Settings > Project > Members.
Locate the members who should no longer receive the reports.
Toggle off the email notification setting for those specific users.

Alternative: Email Forwarding

If you do not wish to add a new member to your Cekura project, you can set up an auto-forwarding rule within your own email client. You can configure your inbox to automatically forward any emails received from Cekura to your desired reporting address (e.g., pg-prod-reporting@prodigaltech.com). Note: If you find that certain members (such as Admins) are still receiving emails after disabling them in settings, please reach out to support, as this may be a known synchronization issue regarding administrative permissions.

How can I add the observability URL to an n8n flow instead of configuring it directly in Retell?

If you already have a post-call analysis workflow in n8n using Retell’s webhook slot, you can integrate Cekura observability by forwarding the call payload from your n8n flow to Cekura’s endpoint. This allows you to maintain your existing logic while still benefiting from Cekura’s monitoring and evaluation capabilities.

Step-by-Step Integration via n8n:

Receive the Webhook: Ensure your n8n workflow starts with a Webhook node that receives the call_analyzed or call_ended event from Retell.
Process Data (Optional): Perform any custom post-call analysis or data transformation within yourn8n flow as you normally would.
Add an HTTP Request Node: At the end of your flow (or at the desired point), add an HTTP Request node to forward the data to Cekura.
Configure the Node:
- Method: POST
- URL: Use your project-specific Cekura observability URL.
- Body Parameters: Send the JSON payload received from Retell. Cekura needs the transcript, recording URL, and any relevant metadata to perform evaluations.
Authentication: Include your Cekura API key in the headers if required by your specific project configuration.

Helpful Resources:

Documentation: For specific details on how to structure the forwarded response, please refer to the Cekura Retell Observability Guide.
Video Tutorial: You can also view a community-created walkthrough of this setup here: Cekura Integration Video.

Where else do I need to update the billing email address if invoices are still being sent to the previous email after an update?

If you have updated your account email but invoices are still being sent to a previous address, you likely need to update the contact information in the Billing Portal. Cekura separates account profile settings from billing and subscription management to allow for different administrative and financial contacts. How to update your billing email: 1. Log in to the Cekura platform and navigate to Settings. 2. Click on the Billing or Subscription tab. 3. Select Manage Subscription or Update Billing Info to access the billing portal. 4. In the portal, find the Billing Information section and update the email address to your preferred recipient. 5. Save your changes. This ensures all future invoices and receipts are sent to the updated address. If the issue persists, please contact support for manual verification.

How can I replay evaluation webhooks for calls that were already evaluated prior to the webhook integration?

Currently, Cekura does not support a direct ‘replay’ feature for webhooks on evaluations that occurred before the webhook integration was active. Webhooks are designed to send real-time notifications for new evaluation events as they happen. To populate your system with data from calls that have already been evaluated, you should use the Cekura API to fetch the historical data. You can retrieve details for past calls using the Cekura Observability API at https://docs.cekura.ai/api-reference/observability/get-call. Since the data structure returned by the API is consistent with the information sent in a webhook payload, you can pass the API response through your existing webhook handler or ingestion logic to ensure your records are up to date. For all calls evaluated after your integration, the webhooks will trigger automatically, ensuring your system stays synchronized in real-time.

Is there a way to setup multiple phone numbers for a single agent?

Yes, Cekura allows you to use multiple phone numbers for a single agent by using the phone number override feature during test execution. This is particularly useful for CI/CD pipelines where you might have different phone numbers assigned to different environments (e.g., dev, staging, and production) but want to test the same agent logic.

How it Works

Instead of creating separate agents for every phone number, you can maintain a single agent configuration and specify which phone number to dial at the time you trigger a test run.

Benefits

Simplified Management: You don’t need to duplicate agent settings or evaluators for different environments.
Version Control: Ensure that the version of the agent currently on your dev branch is being tested against the dev phone number before it is merged to main.
Dynamic Testing: Easily switch between different entry points for your AI voice agent without manual configuration changes in the Cekura dashboard.

How can I set up a custom metric with a numerical rating scale instead of a boolean pass/fail to capture more nuance in transcription accuracy evaluations?

While Cekura defaults to boolean (Pass/Fail) metrics because LLMs are generally more accurate with binary tasks, you can create custom Rating metrics to capture more nuance in your evaluations. Cekura supports a 0-5 rating scale for this purpose.

Steps to Create a Rating Metric

Navigate to Metrics: Go to the ‘Metrics’ section in your Cekura dashboard.
Create New Metric: Click the Create Metric button.
Select Type: Choose Rating as the metric type. Please note that Cekura uses a 0-5 scale rather than 0-10.
Describe the Metric: Provide a brief, clear description ofwhat the metric is evaluating (e.g., ‘Assess the accuracy of specific data points like dates and numbers mentioned in the transcript’).
Optimize: Click the Improve button. This allows the platform to refine the prompt logic to ensure the LLM understands the nuances of your request.
Review and Assign: Review the metric configuration and add it to your existing test scenarios.

Best Practices for Numerical Scoring

Define Score Meanings: To get the most accurate results, clearly define what each numerical value represents in your description. For example, specify what constitutes a ‘3’ versus a ‘5’.
Generic Use Cases: Custom rating metrics are most effective when the criteria are generic enough to be applicable across all or most of your calls.

For more detailed information on configuring metrics, you can refer to the Cekura Metrics Documentation.

How can I access an overall report of the results?

1. Accessing Overall Reports

You can view the summary and performance of your test runs in two main areas:

Result Page: This page provides a detailed breakdown of a specific run, including individual test cases and specific action items to improve your agent.
**Runs Overview Page:**This provides a high-level view of all your historical runs, allowing you to track progress over time.

How can I accurately track and report call quality metrics such as ghost inputs, transcript errors, and call termination reasons using the Cekura observability API?

To accurately track and report advanced call quality metrics in Cekura, you should leverage the observability API to send detailed call data, including specific metadata and termination reasons. This allows Cekura to evaluate the performance of your AI agent against expected outcomes.

1. Tracking Call Termination (User vs. Bot Hangups)

To improve the accuracy of the Appropriate Call Termination metric, you should explicitly send the reason the call ended. This helps Cekura distinguish between a user hanging up, the agent ending the call, or technical failures.

Field: call_ended_reason
Recommended Values: user ended, agent ended, connection lost, etc.
Impact: Providing this field significantly improves the quality of termination metrics and helps identify if a bot generated a prompt that the user never heard due to a hangup.
Reference: Cekura API - Send Calls

2. Handling Transcript Errors and ASR Inaccuracies

If you notice duplicate lines or inaccuracies in transcripts (often caused by ASR providers like Deepgram), ensure you are sending transcripts in a supported format. You can then leverage our “Transcription Accuracy” metric to detect inefficiencies in your transcript.

Best Practice: Use the transcript_json field to send structured data rather than raw text.
Documentation: Transcript Format Guide

4. Using Metadata vs. Dynamic Variables

Understanding how to pass extra data is crucial for fine-tuning evaluations:

Metadata: Use this for call-specific information (e.g., due_amount). During evaluation, Cekura can check if the bot correctly mentioned the specific due_amount passed in the metadata.
Dynamic Variables: These are used to replace variables within your Agent Description. This is specifically used for the Instruction Follow metric to ensure the agent is adhering to its persona based on changing parameters.

Why are transcripts with tool calls not being fetched from Retell, and could this be due to HIPAA compliance being enabled?

If you notice that transcripts or tool calls are missing from your Retell-integrated calls in Cekura, it is highly likely due to HIPAA compliance settings being enabled on your Retell account.

Why this happens

When HIPAA compliance is active on a platform like Retell, it often restricts the storage of sensitive data or prevents transcripts and tool call logs from being accessed via external APIs to ensure data privacy. Since Cekura requires these transcripts to perform evaluations and monitoring, these restrictions prevent the platform from fetching the necessary data for your dashboard.

Troubleshooting Steps

Verify Retell Settings: Log in to your Retell dashboard and check if HIPAA compliance is currently enabled.
Check Data Permissions: If HIPAA compliance is necessary for your use case, ensure that your configuration allows for the secure transmission of transcripts to authorized third-party platforms like Cekura.
Tool Call Metadata: Because tool calls are typically embedded within the transcript or call metadata, any restriction on the transcript will also result in missing tool call information.

For more details on how Cekura processes call data for monitoring, you can visit our Observability Overview. If you are manually fetching call data via our system, please refer to the Get Call API documentation.

How do I re-evaluate metrics on existing simulation runs?

You can re-run one or more metrics on simulation runs that have already completed, without re-running the full simulation. This is useful after updating a metric prompt to see how the revised logic performs against historical data.

Steps to Re-evaluate Metrics

Navigate to the Results page for your project.
Select the checkboxes next to the runs you want to re-evaluate. You can select individual runs or use the header checkbox to select all.
Click the Reevaluate Metrics button that appears in the toolbar.
Choose the specific metrics you want to rerun from the list.
Confirm — Cekura will queue the evaluations and update the results.

Note: Re-evaluating metrics consumes credits at the standard rate of 0.2 credits per metric evaluation. See the Credit Optimization section above for details.

Why is my dashboard showing empty data after re-evaluating metrics for previous calls, and what is the maximum time window for plotting this data?

If your dashboard appears empty after re-evaluating metrics for previous calls, it is typically due to one of two common configuration issues: agent filter mismatches or exceeding the supported time window.

1. Check Your Agent Filters

The most common cause for missing data is a mismatch between the agent selected in the dashboard filters and the agent for which the metrics were actually run.

The Issue: Dashboards often have a global agent filter located in the top-right corner. If this is set to a specific agent (e.g., Agent A), but you ran re-evaluations for a different agent (e.g., Agent B), the dashboard will correctly show no data for the selected filter.
The Fix: Verify the Agent ID used during your metric run and ensure the dashboard filter is either set to that specific Agent ID or set to ‘All Agents’.

2. Maximum Time Window Limit

Cekura currently supports a maximum time window of 30 days for plotting metric data on the dashboard.

The Issue: If you attempt to plot a date range wider than 30 days, the graph may fail to render data correctly.
The Fix: Adjust the date picker on your dashboard to a window of 30 days or less.

Summary Troubleshooting Steps:

Verify Agent ID: Confirm the ID of the agent you are testing.
Adjust Dashboard Filters: Ensure the top-right agent filter matches your test data.
Check Date Range: Ensure the selected time window does not exceed 30 days.
Refresh: Once filters are aligned, the re-evaluated metrics should populate the plots immediately.

For more details on how agent metadata influences metric reporting, you can refer to our guide on Metadata and Metrics.

Does Cekura store copies of conversation audio files (e.g., from Retell), and is the data encrypted?

Cekura’s data storage policy depends on the specific feature you are using within the platform.

Data Storage by Feature

Observability (Monitoring): When you use Cekura for monitoring conversations (such as sending call recordings or transcripts from providers like Retell), Cekura does store a copy of the data. This is necessary to perform evaluations based on your defined metrics and to provide historical analysis of your AI agent’s performance.
Testing: For the testing workflow—where you provide agent context, generate evaluators, and run test cases—Cekura does not store a copy of the conversation data.

Security and Encryption

Security is a top priority at Cekura. All data stored within our platform, including conversation audio and transcripts, is encrypted at rest.

Data Retention

Cekura retains monitoring data for one year by default, after which it is automatically removed. This applies to:

Call recordings and audio files sent via the observability endpoint
Transcripts and conversation data from monitored calls

Your account configuration data is not subject to this automatic deletion and remains available indefinitely:

Metrics (your metric definitions and evaluation settings)
Voices / agent personalities (the personality configurations you have set up)

If you need to delete monitoring data before the one-year period — for compliance or privacy reasons — you can do so at any time using Cekura’s delete endpoint. If your organisation requires a custom retention window (shorter or longer than one year), contact our support team.

Relevant Resources

If you are integrating your conversation data for monitoring, you can find more details on how to manage and retrieve call data in our documentation:

How is the pay-as-you-go (developer) plan priced, and do I get more credits when I add team members?

Cekura’s pay-as-you-go plan is designed for individual developers and small teams. The pricing structure is per seat:

First user: free
Each additional user: $30 per user per month

Credits are purchased separately from the seat price. Adding a team member does not automatically increase your credit allocation — credits are bought independently as needed. For higher credit volumes, concurrent call limits above 10, or multi-project access, contact support@cekura.ai to discuss available plans.

What is the maximum number of projects I can create on my plan?

The number of projects you can create depends on your subscription plan. The Developer plan includes a limit of 1 project. If you receive a 400: Max projects limit reached error when trying to create a new project, you have reached your plan’s project quota. To work across multiple isolated environments — for example, keeping a development project separate from a production project — you may need to upgrade to a higher plan. Contact support@cekura.ai to review available options or request a limit increase.

Is there a limit on how many scenarios I can run Improve Scenarios on at a time?

Yes. The Improve Scenarios feature has an enforced limit of 100 scenarios per batch. This is an intentional platform constraint, not a bug. If you need to apply improvements to more than 100 scenarios, run the feature in multiple batches. Use Cekura’s tag and filter tools to narrow your scenario selection to 100 or fewer at a time, apply Improve Scenarios to that batch, then repeat for the remaining scenarios.

How can I reduce high response latency (4-6 seconds) in the Testing Agent to prevent issues with silence detection systems?

High latency in the Testing Agent can disrupt silence detection systems and break the flow of a realistic conversation. If you are experiencing delays of 4-6 seconds, here are the recommended steps to optimize performance and reduce response times:

1. Adjust Personality Settings

The Testing Agent’s behavior is governed by its “Personality” configuration. You can reduce the initial delay by modifying the speaking plan:

Navigate to your Agent’s Personality settings.
Locate the Start Speaking Plan section.
Reduce the Wait Seconds value. Setting this to a lower value (e.g., 0 or 1 second) instructs the agent to initiate its response as soon as it detects the end of the user’s turn.

2. Use Structured Tests for Deterministic Responses

By default, the Testing Agent often uses an LLM to generate responses based on the conversation context, which adds processing time. To achieve faster, more deterministic responses:

Implement Structured Tests within your evaluators.
Use Fixed Messages for specific scenarios. This allows the agent to respond with a pre-defined string immediately upon detecting a trigger, bypassing the additional LLM query and significantly reducing latency.
This approach is ideal for standard greetings, specific verification steps, or any part of the test case where the response is predictable.
You can find detailed instructions here: Structured Tests Documentation

3. Understanding Latency Spikes

Latency can occasionally be affected by intermittent performance issues with transcribers or LLM providers. While Cekura works to accommodate these spikes within your configured wait times, using deterministic evaluators (Fixed Messages) is the most effective way to ensure the agent stays within the timing requirements of your silence detection system. If you continue to see latency that significantly exceeds your configured settings, please provide specific Run IDs or Call IDs to the support team for a detailed trace analysis.

Which LLM and TTS providers are commonly used in production voice agents?

Based on current usage across production voice AI deployments, the most common choices at each stage are:

STT: Deepgram — consistently low transcription latency for real-time applications.
LLM: GPT-4o, GPT-4.1, and Gemini 2.5/3 Flash — widely used for their balance of quality and response speed. Very recent frontier models tend to carry higher latency than these established mid-tier options, making them less suitable for real-time conversation.
TTS: ElevenLabs and Cartesia — both offer high-quality synthesis with low time-to-first-audio and are the most common choices for production voice agents.

For architecture guidance and latency targets at each stage, see Best practices for achieving low end-to-end latency.

What are best practices for achieving low end-to-end latency in a cascading voice AI pipeline (STT → LLM → TTS)?

A cascading pipeline routes audio through three sequential services — Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS) — so each stage adds to the overall response time. With the right model choices and architecture, end-to-end latency of around 2 seconds is achievable.

Model selection

Choose providers optimised for inference speed at each stage:

STT: Deepgram offers consistently low transcription latency and is a common choice for real-time voice applications.
LLM: GPT-4o, GPT-4.1, and Gemini 2.5/3 Flash are widely used in production for their balance of quality and response speed. Smaller variants (e.g. GPT-4o mini) can reduce latency further but may trade off response quality. Very recent frontier models often carry higher latency than the established mid-tier options noted above.
TTS: ElevenLabs and Cartesia both provide high-quality synthesis with low time-to-first-audio and are the most common choices for production voice agents.

Pipeline orchestration

Frameworks like Pipecat are well-suited to cascading voice pipelines. Pipecat handles the streaming handoff between STT, LLM, and TTS stages and keeps end-to-end latency predictable.

Regional considerations

If your users are geographically distant from your model providers’ primary data centres (common for EU deployments when using US-hosted APIs), check whether each provider offers regional endpoints. Routing to a closer endpoint can meaningfully reduce round-trip time at each stage.

Monitoring with Cekura

Once your pipeline is running, use Cekura’s latency metric to track per-turn response times across test runs, and the metadata field to tag calls by region or configuration variant so you can compare latency across deployment contexts.

What is the typical p95 latency seen in the market for voice AI agents?

Based on empirical data across voice AI deployments, the mean p95 latency (excluding tool calls) for production voice agents is typically in the 4–5 second range. A 3-second p95 target is below the market average and a competitive benchmark to aim for.

Around 2s: Achievable with an optimised cascading pipeline (fast STT + small LLM + low-latency TTS). See best practices for low end-to-end latency for how to reach this range.
3s: A reasonable target — competitive and noticeably better than the market mean.
4–5s: Typical market mean p95 for production voice agents.

Use Cekura’s latency metric to track your agent’s per-turn p95 across test runs and production calls, and use metadata tagging to segment by provider, region, or configuration to identify where latency accumulates.

Why does the latency Cekura reports differ from what my voice platform’s dashboard shows?

Voice platforms such as Retell typically report internal pipeline latency — the time from when speech recognition (ASR) finishes to when the LLM generates a response, which may exclude TTS rendering and the final audio-delivery step. Cekura measures end-to-end (E2E) latency: the full gap from when the Testing Agent finishes speaking to when the Main Agent’s audio begins, capturing every stage — STT, LLM inference, TTS rendering, and network round-trip time. Because Cekura’s measurement is more comprehensive, it reads higher than a provider’s own dashboard figure. A gap of 800–1000 ms between the two is typical. A Cekura latency reading of 1500–2000 ms is generally good for a production voice agent. See the Latency metric reference for how per-turn values are computed.

Which Speech-to-Text (STT) provider works best for real-time multilingual conversations?

The right STT provider depends on whether your primary concern is multilingual language coverage or raw transcription latency:

ElevenLabs is the recommended choice for multilingual real-time use cases (including English/Hindi). It handles cross-language audio well and is optimised for conversational voice AI workflows.
Deepgram is the recommended choice when minimising transcription latency is the priority, particularly for English-first use cases.
OpenAI (Whisper) and Google (Gemini) provide strong transcription accuracy but carry higher latency than ElevenLabs and Deepgram, which can affect responsiveness in real-time conversations.

The best provider often varies by audio characteristics (telephony audio vs. high-quality microphone) and language mix. Use Cekura’s Transcription Accuracy metric to compare providers objectively on your own audio before committing to one in production.

How can I bulk reassign evaluators to another agent and prevent associated test cases from being deleted when an agent is removed?

When managing agents and their associated evaluators (test cases) in Cekura, you have full control over how data is preserved during deletions and how it is reassigned to new agents.

Preventing Evaluator Deletion

When you delete an agent, Cekura provides a safeguard to ensure you don’t lose your test cases accidentally:

Initiate the deletion of the agent.
A confirmation prompt will appear asking if you want to delete the evaluators associated with that agent.
Unselect/Uncheck the option to delete associated evaluators.
Confirm the deletion.

By following these steps, the agent will be removed while your evaluators remain intact in your library.

Bulk Reassigning Evaluators

To move multiple evaluators to a different agent efficiently, you can use the Improve Scenarios feature:

Navigate to the Improve Scenarios chat interface.
This interface allows you to bulk edit the metadata of your evaluators.
Select the evaluators you wish to move and update the associated agent field to the new target agent.

This is the recommended workflow when you are migrating tests from a default or temporary agent to a specific production-ready agent configuration. For more details on configuring your testing environment, please visit the Agent Setup Guide.

Is there a way to mass update expected outcomes or tags for a particular set of scenarios using the ‘improve scenarios’ feature?

Yes, Cekura provides a powerful Improve Scenarios feature that allows you to perform bulk updates on your test cases using a natural language chat interface. This feature is designed to eliminate manual editing by allowing you to describe the changes you want to make to a group of scenarios at once.

How to Mass Update Expected Outcomes

To update the expected outcomes for a specific group of scenarios, follow these steps:

Filter Scenarios: Use the filter options in your scenario list (e.g., filter by a specific tag) to isolate the scenarios you wish to update.
Select Scenarios: Select the checkboxes for the scenarios you want to modify.
Open Improve Scenarios: Click the Improve Scenarios button. This will open a chat agent interface.
Provide Instructions: Simply tell the agent what you want to change. For example: “Update the expected outcome for all selected scenarios to ensure the agent asks for a zip code.”
Execute: The AI will process the request and update the selected scenarios automatically.

Mass Updating Tags

Currently, the Improve Scenarios tool is optimized for updating scenario content and expected outcomes. While you can filter by tags to select your scenarios, the ability to mass update or remove tags directly via the chat agent is a feature that is currently being added and will be available in the coming days.

Can I duplicate evaluators, agents, or metrics across projects?

Evaluators can be duplicated across projects. Select one or more evaluators in the dashboard, click Actions > Duplicate, then choose the target project and folder. This copies the evaluators to the new project without modifying the originals, giving you a consistent baseline that you can then customize per project. See Organizing Projects for a full walkthrough. Agents and metrics do not have a built-in UI duplication tool. To replicate agent or metric configuration across projects, use the REST API or the MCP server to create them programmatically with the same settings. For environment separation (e.g., staging vs. production), another option is to keep all agents in a single shared project and use separate API keys or agent IDs to route webhook payloads to different backend environments. This avoids the need to keep multiple copies of metrics and evaluators in sync. See Organizing Projects for recommended patterns.

How do I assign an evaluator to a specific folder when creating it via the API?

Include the folder_path field in your request body when calling the Create Evaluator endpoint. The value is a dot-separated path string matching the folder hierarchy in your project — for example, "Booking.Happy path" or "Sales.Inbound". The folder must exist before you reference it. Use the Create Folder endpoint (POST /test_framework/v1/scenarios/create_folder/) to create the folder first. If you omit folder_path, the evaluator is placed in the project root. Opening a root-level evaluator in the UI will prompt you to assign it to a folder before you can save changes.

How do I find or create my Cekura API key?

Navigate to Settings → Organization → API Key in the Cekura dashboard (direct link: dashboard.cekura.ai/dashboard/settings/org/api-key). From there you can copy an existing key or create a new one. Admin access required. The Organization section of Settings is only visible to users with Admin role. If you don’t see it, ask an admin on your organization to create or share a key with you. For more on role permissions, see Enterprise Setup.

I already post call data to the observability endpoint from my Pipecat agent — can I also add the Pipecat SDK?

The Pipecat SDK (observe_pipeline() / observe_and_create_task()) and the direct observability endpoint (POST /observability/v1/observe/) are alternative, mutually exclusive integration paths — they cannot be combined for the same session. Each method independently creates its own call record in Cekura. If both run for the same session, Cekura creates two separate records; there is no deduplication or merging. The SDK is designed to replace the direct API for Pipecat-based agents.

Migrating from the direct API to the SDK

If you have an existing integration that posts call data to observability/v1/observe/ from a post-processing step, follow these steps before adding the SDK:

Add observe_pipeline() (or observe_and_create_task()) to your Pipecat pipeline.
Move any metadata from your direct API call to set_custom_metadata({...}) — see Custom Metadata in the Pipecat Tracing docs. This can be called in a cleanup or post-processing handler that runs before the pipeline closes.
Remove the observability/v1/observe/ call from your post-processing step.

If I already capture and store audio in my own Pipecat pipeline, will the Cekura SDK record it again?

Yes — when you use observe_pipeline() (or observe_and_create_task()), the SDK registers its own audio frame processor and performs a parallel recording alongside any existing audio capture you have in your pipeline. This is intentional: the SDK needs its own recording to power Cekura’s observability and evaluation features. The parallel recording is designed to have minimal impact on your pipeline:

Asynchronous: All SDK recording tasks run asynchronously and do not block your main pipeline or add latency to conversations.
Chunked multipart upload: Audio is uploaded to Cekura in small chunks, keeping memory consumption low even during long sessions.

Your existing audio artifacts are unaffected — the SDK records independently and does not interfere with how you store or process audio elsewhere in your pipeline. If you want to avoid the parallel recording entirely, use track_pipeline() instead — see Can I disable or selectively control audio recording in the Pipecat SDK? for the full set of recording control options.

Can I disable or selectively control audio recording in the Pipecat SDK?

The Pipecat SDK offers several recording control options depending on your requirements: No recording — use track_pipeline() (or track_and_create_task()): These methods capture transcripts, tool calls, logs, and OTel traces but perform no audio recording. Use this for simulation testing environments or any case where audio recording is not permitted. Stop recording mid-call — use abort(): Calling await tracer.abort() halts all capture immediately and prevents any data from being sent to the Cekura backend. This is useful if consent is withdrawn partway through a call and you want to ensure nothing is stored. Disable the tracer entirely: Set the environment variable CEKURA_TRACER_ENABLED="false". When set, the tracer is a no-op — no data is captured and nothing is sent to Cekura. Per-call selective recording (starting without recording and enabling it mid-call): This is not currently supported. There is no built-in mechanism to start a call without recording and then begin recording once consent is confirmed mid-conversation. If your workflow requires starting audio capture at a specific point in the conversation, the recommended approach is to use track_pipeline() for calls where consent has not yet been obtained, and observe_pipeline() for sessions where consent is established upfront. If you need a more fine-grained per-call recording toggle, reach out via your support channel.

What can I do on the sandbox plan — does it allow real pre-production testing?

The sandbox plan gives access to the full Cekura platform. You can run real pre-production tests against your agents, use all testing, evaluation, and observability features, and validate your integration before going live. If you run out of credits during your sandbox phase, reach out via your support channel to request a top-up.

Can I assign the same contact number to multiple agents?

Yes — contact_number does not need to be unique per agent. The same phone number can be assigned to multiple agents without causing conflicts. When you run a scenario, the agent attached to that scenario is what Cekura evaluates, so there is no ambiguity about which agent is under test regardless of shared numbers. Make sure each evaluator (scenario) has the correct agent selected before triggering a run. For more details on setting up and managing your testing environment, please visit the Cekura Documentation.

Is there a way to recover call logs that were missed due to insufficient credits?

If your account runs out of credits, Cekura stops ingesting and processing new calls. The auto-fetch feature does not backfill calls that were missed during a credit gap — only calls that arrive after credits are restored are picked up going forward. To analyze calls from the missed window, you must manually resend them using the Send Calls API:

Identify the timeframe of the gap (start and end timestamps).
Export the call recordings or transcripts from your voice provider or internal systems for that window.
POST them to the Send Calls endpoint to ingest them into Cekura for evaluation.

To prevent future gaps, set up a low-credit alert in your dashboard so you can top up before service is interrupted.

Does Cekura provide phone numbers for all countries, and what should I do if my region is not covered?

Cekura does not provide phone numbers for all regions. If you need a testing number in a region Cekura doesn’t cover natively, you can bring your own number from a third-party telephony provider such as Plivo or Twilio and import it into Cekura. Once imported, the number works as a normal Cekura testing number — your system calls it and Cekura’s testing agent answers. No additional integration with the telephony provider is required beyond the import step. See the Plivo import guide for step-by-step instructions.

How do I submit a bug report or feature request?

Email support@cekura.ai — this routes directly to the engineering team. For bugs, include a brief description of the issue, any relevant run IDs, evaluator URLs, or dashboard links, and your expected vs. actual behavior. For feature requests, a short description of the use case is sufficient. The team will triage and follow up in your support channel.

Does Cekura support threshold-based alerts, and how should I configure alerts for a specific metric?

Cekura does not currently support standard fixed-threshold alerts (for example, “alert me if this score falls below 80%”). The only supported alert type for monitoring is Significant Change Alerts.

Why Significant Change Alerts instead of fixed thresholds?

LLM judge metrics are non-deterministic — the same agent interaction can receive slightly different scores across evaluations. A fixed threshold would generate excessive noise in this environment. Significant Change Alerts work by establishing a statistical baseline and triggering only when the metric deviates meaningfully from that baseline:

Cekura computes a rolling baseline (e.g., 90% success rate on your adherence metric).
You configure a sensitivity margin based on standard deviation.
The alert fires only when the metric falls outside that margin — for example, dropping to 84% when the baseline is 90% and the margin is 5%.

This approach surfaces real regressions while ignoring normal LLM variance.

How to set up an alert

Navigate to the Alerts section in the Cekura dashboard.
Click Add New Alert in the top right corner.
Select the agent and metric you want to monitor.
Configure the Significant Change Alert sensitivity to match your needs.

For more on how metrics are structured, see the Metrics documentation.

Is there a way to pull observability call data across multiple projects in a single API call?

No — the List Calls API requires you to scope results to a single project or agent. Cross-project querying in a single call is not supported; the database is indexed at the project level, and cross-project joins are prohibitively expensive. When using an organization-level API key, providing either project_id or agent_id is mandatory to scope the query to the correct project.

Workarounds if your data is split across projects

Multiple API calls: Make one request per project_id and merge the results in your own code.
Consolidate into one project: If you recently migrated an agent to a new project and data is now split, the Cekura team can help move historical data into a single project. Reach out via your support channel to request this.

How can I view my total credit usage or expenses from a previous month?

The Cekura dashboard does not currently provide a single consolidated “last month” credit summary or billing history export. What is available:

Settings → Billing: Use the date filter to view credit usage over a selected window. The maximum filter window is 30 days, so you can manually set a range that covers the month you want to review. This gives you an indication of usage for that period rather than a pre-built monthly report.
Subscription invoices: For subscription fees and payment history, go to Settings → Billing → Manage Plan to access the Stripe billing portal.
Detailed usage on request: If you need a precise breakdown (credits consumed by type, overages vs. subscription, etc.), the Cekura team can pull this for you. Reach out via your support channel.

A full billing history view is on the roadmap.

Does Cekura require access to my codebase during onboarding?

Codebase access is preferred but not mandatory. The reasons Cekura requests it:

Automated improvement PRs: With repository access, Cekura can push pull requests directly that improve your agent based on evaluation results — for example, updating prompts, adjusting tool configurations, or refining mock data.
Faster initial integration: Setting up mock data, tool call schemas, and traces is significantly quicker when Cekura can inspect your existing code structure rather than receiving it piecemeal.

If you prefer not to grant codebase access, the integration can still proceed by coordinating directly with a member of your technical team. In that case, Cekura would typically need the following provided manually:

Event and data schemas relevant to the metrics being built
How your system emits traces, logs, or instrumentation data
Tool call definitions and metadata fields used in conversations

The trade-off is a slower initial setup and more back-and-forth to get mock data and traces configured correctly. You can also grant access for the initial integration period and revoke it afterward. If you have questions about the specific access scope required for your engagement, ask in your support channel and a team member will clarify what applies to your setup.

What does “Create Evaluator from Call” actually do, and can I use it to replicate a production failure as a test case?

When you use the “Create Evaluator from Call” (or “Create Scenarios from Call”) feature on a production call log, Cekura reads the transcript of that call and generates a new structured evaluator that recreates the same conversation flow, intent, and expected outcomes. The resulting evaluator can then be run repeatedly as a regression test — this is the recommended approach for turning a specific production failure into a repeatable test scenario. For finer control over branching conditions, turn sequences, or IVR flows, build the evaluator directly using Conditional Actions. Important limitation for STT testing: When the evaluator is run, the Testing Agent replays the transcript text using synthetic TTS audio (ElevenLabs), not the original production audio recording. This means the evaluator cannot be used to test a different STT provider against the original audio — the STT input will be re-synthesised speech, not the real recording. If your goal is to benchmark STT providers against real production audio, the correct approach is:

Pull the call log for each target call via the Cekura API to get the per-turn start_time and end_time values from the transcript JSON (see Transcript Format).
Use those timestamps to slice the original audio recording into per-turn segments (e.g. with ffmpeg).
Send those audio segments to each STT provider you want to compare and compute WER against a ground-truth transcription.

The Cekura transcript provides the timing data needed for step 1; the STT comparison pipeline itself runs outside Cekura.

Can I test my agent using audio recordings or audio files I provide?

The Testing Agent generates its own synthetic speech for every run — it does not accept externally provided audio files as test inputs. To replicate a specific failure scenario or conversation flow, use Structured Tests: write a rule-based evaluator whose conditions follow the conversation path you want to reproduce, and the Testing Agent will simulate that sequence with its own synthesised voice. If the failure you want to reproduce came from a production call already captured in Cekura, the Create Evaluator from Call feature converts that call’s transcript into an evaluator, preserving the turn-by-turn dialogue as conditions. The Testing Agent replays the conversation using synthetic audio rather than the original recording. For STT benchmarking against original audio recordings, see What does “Create Evaluator from Call” actually do?.

How can I view all historical transcripts for a specific evaluator?

You can access all past runs for a specific evaluator using the filtering and grouping features on the Results page:

On the Results page, use the Filter by evaluator name option to scope the view to the evaluator you care about.
Select the filtered runs you want to review, then use the Create Result action (sometimes labelled “Group into Result” or “Club Runs”) to merge them into a single combined result. This brings all selected runs into one result view where you can navigate between individual conversation transcripts.

This workflow lets you review transcripts across runs to spot when a regression was introduced, compare responses across LLM versions, or audit how the agent has behaved over time on that specific test case. For more on organising and filtering results, see Organising Projects.

Is there a way to add notes or annotations to individual test call runs?

Currently, Cekura does not have a dedicated free-form note-taking feature for individual test call runs. The closest option is Metric Lab annotations: within a Labs review session you can click the feedback/notes icon on any evaluation row and write a note explaining why a call passed or failed a given metric. These notes are primarily designed to help refine the metric definition via Auto Improve, but they also serve as a per-call qualitative record tied to that metric. See the Metric Lab guide for the full annotation workflow. If you need observations that are not tied to a specific metric, the typical approach is to maintain an external log alongside the Cekura run IDs — referencing the run ID later links your notes back to the dashboard record. If you find yourself repeatedly noting the same category of failure, consider creating a dedicated metric for it. That automates the tracking and surfaces the pattern directly in your results without manual note-keeping. If you are using Observability to monitor production calls (rather than running evaluations in the test framework), note that individual call logs in that view support a per-metric feedback mechanism: you can vote thumbs-up or thumbs-down on a specific metric evaluation result and add a free-text note explaining why the evaluation was correct or incorrect. This feedback is accessible from the call log detail view and from Slack alert notifications (via the 👎 button next to Go to call). Like Metric Lab annotations, this feedback is tied to specific metric evaluations rather than being a free-form call-level note. See Creating a Good Metric for the annotation workflow.

Why is a count metric failing even when the measured value is zero?

This happens when the Rubric threshold for that metric is configured with the wrong comparison direction. In Cekura, a metric measures a value — for example, the number of unnecessary repetitions — and the Rubric determines whether that value is a pass or fail based on an operator and threshold you configure. For “lower is better” count metrics (such as Unnecessary Repetition Count), a measured value of 0 is the best possible result. However, if the rubric rule uses “greater than or equal to” with a threshold like 3, the evaluation fails whenever the count is below 3 — including 0 — because the rule requires a minimum count to pass, which is the reverse of what you want.

How to fix it

Open the evaluator’s Rubric settings (click View Rubric on the metric) and update the operator to “less than or equal to” with a threshold that sets the maximum acceptable count. For example:

To fail only when repetitions are detected at all: Unnecessary Repetition Count ≤ 0
To allow up to 2 repetitions before failing: Unnecessary Repetition Count ≤ 2

The same principle applies to any metric whose interpretation says “Lower is better” (see the Interpretation note on each entry in Pre-defined Metrics): use ≤ for those, and ≥ only for metrics where higher values indicate better performance (such as boolean pass/fail metrics, which use ≥ 5).

How does Cekura handle decision-level reasoning and explainability, and what data do I need to provide for audit purposes?

Cekura takes model reasoning into account when explaining why specific outcomes occurred and why evaluations passed or failed. Every LLM Judge and Python metric evaluation produces an explanation field that describes why the rubric was or wasn’t met. For workflows where you need a complete audit trail — including the internal logic behind each decision, not just whether the outcome was correct — you need to provide the reasoning data yourself, because Cekura cannot reconstruct internal agent decision logic that isn’t present in the transcript or the data you send.

What to send and how

Your LLM generates reasoning tokens as part of its output. Forward the full reasoning to Cekura via the metadata field of your observability ingest payload alongside the standard model output. Key points:

Send the complete reasoning trace. There is no cost or performance overhead within Cekura for including reasoning in metadata — only metric evaluations consume credits (0.2 credits per evaluation). Metadata content has no additional charge.
Metadata is immutable after the initial POST. Collect and include the reasoning before sending; it cannot be added or modified after the call record is created.
Alternative — OpenTelemetry tracing. If your agent’s decisions are expressed as tool calls, you can instead emit reasoning via OTel tracing. tool_call spans can carry function.input / function.output attributes, so decision rationale attached to tool calls flows into Cekura and appears linked to the call record.

Once the reasoning is present in the metadata or OTel spans, Cekura surfaces it in the evaluation report alongside the pass/fail outcome, giving you a per-decision audit trail.

How do I keep my Cekura skills up to date when new skills are released?

How you update depends on how you installed the skills. Claude Code marketplace: Run /upgrade-skills in any Claude Code session. To get updates automatically at every session startup, open the plugin manager (/plugin), go to the Marketplaces tab, select cekura-skills, and enable auto-update. Other agents (Cursor, Codex, Windsurf, etc.): Auto-update is not available — unlike the Claude Code marketplace, there is no background refresh option for these agents. Run the update command manually whenever you want the latest skills:

# Refresh existing skills
npx skills update

# Or refresh existing AND pick up any newly-added skills
npx skills add cekura-ai/cekura-skills --all

Cekura pushes skill updates whenever the underlying API or MCP server changes, so running the update command is all that is needed on your side. For full installation and update instructions, see Skills.

Can I compare Cekura metric results against my own manual or ground-truth scores?

Yes. Write a Python metric that reads the results of other Cekura metrics already evaluated on the same call, then applies your own comparison logic. Inside any Python metric, every other metric that ran on that call is available by name in the data dictionary:

# Read a Cekura metric result
cekura_score = data["Quality Score"]

# Read a manually supplied value from a Test Profile
manual_score = data["test_profile"]["ManualScore"]

# Compare and produce a result
within_tolerance = abs(cekura_score - manual_score) <= 5
_result = within_tolerance
_explanation = f"Cekura score {cekura_score} vs manual {manual_score} — {'within' if within_tolerance else 'outside'} tolerance"

You can supply the manual score via a Test Profile so it varies per test case without editing the metric code. For the full data dictionary reference, see Metric Results Access.

Changelog

⌘I

​Frequently Asked Questions

​How do credits work in Cekura, including pricing for testing, monitoring, and overages?

​Understanding Credits and Pricing in Cekura

​1. Credit Consumption Rates

​2. Usage Examples

​3. Overages and Manual Top-ups

​4. BestPractices for Credit Optimization

​Can I see logs of the webhooks Cekura sends to my endpoint?

​Where can I upload knowledge base files on the Cekura dashboard and what is the recommended format for an FAQ-based document?

​How to Upload Knowledge Base Files

​Recommended Formats and Content

​If I add a custom metric to a specific agent, will it automatically apply to all agents or can it be applied to a single agent only?

​Agent vs. Project Metrics

​How to Manage and Create Metrics

​What is the best way to perform A/B testing to compare two different conversational agents?

​Is there a way to export call data or share reports from the Calls page or dashboards?

​1. CSV Export of Selected Calls

​2. Dashboards with Daily Reports

​3. Using the Cekura API

​4. Custom Reporting Support

​Is there a way to update or add metadata to data after it has been posted to the observability endpoint?

​Best Practices for Metadata

​When making an inbound call to the system for testing, do I need to manually trigger the conversation?

​1. If your agent receives calls (Inbound Agent)

​2. If your agent makes calls (Outbound Agent)

​3. Automatic Triggering via Integrations

​How can I request a copy of your SOC 2 report?

​How can I obtain a Business Associate Agreement (BAA) for HIPAA compliance?

​What legal documents does Cekura provide for enterprise due diligence or legal review?

​How can I change the inbound number used to make calls to the main agent?

​How can I integrate Cekura Cronjob success and failure notifications directly into a Datadog dashboard using webhooks?

​Recommended Integration Workflow

​How do I choose a subscription package for load testing, and how can I perform load testing for inbound and outbound calls?

​Subscription Packages and Concurrency

​How to Perform Load Testing (Inboundand Outbound)

​Can I define custom tools or scripts to perform setup and cleanup tasks before and after a Cekura test for end-to-end integration testing?

​Recommended Workflow for Integration Testing

​Why are successful parallel outbound calls showing as timeouts in the dashboard?

​Common Causes for Timeouts

​Best Practices for Parallel Testing

​How can I configure a custom domain and logo for my reports, and what should I do if I encounter an error during the domain setup?

​How to Configure Custom Branding

​Troubleshooting and Plan Restrictions

​How can I perform end-to-end testing of a chatbot’s LLM flow via SMS to ensure all conversation types are handled correctly?

​How to Set Up SMS Testing

​Integration Options for Custom Backends

​Key Benefits

​How can I correlate incoming calls with the specific evaluation runs that triggered them in the Cekura API?

​1. Randomized Phone Numbers (Immediate Correlation)

​2. Custom Integration (Deterministic Correlation)

​Note for Retell and Vapi Users

​What does the response consistency metric test for?

​How do I configure a Retell chat agent in Cekura and ensure that the chat tests appear on the dashboard?

​1. Prepare your Retell Agent

​2. Connect Retell to Cekura

​3. Run and Monitor Tests

​What is the best way to evaluate a scenario where a caregiver or family member picks up instead of the patient, and should I adjust agent settings or generate evals for this?

​Step-by-Step Instructions

​Why this approach?

​Will incoming calls be restricted to a specific phone number, or can I still receive calls from multiple different numbers?

​Does Cekura support a webhook for receiving call observability metrics and evaluation results to store in an external database for analytics?

​How it Works

​Key Benefits

​Documentation

​Which model does Cekura use to calculate the metric results?

​How can I link an ElevenLabs account to view conversation IDs and tool call timestamps for evaluator test calls?

​Why link your ElevenLabs account?

​How should I format transcript data for Cekura’s observability API if my source only provides a single timestamp and not an end time?

​Recommended Workaround

​Important Note on Latency

​Example Transformation

​How can I add Time to First Audio (TTFA) as an infrastructure metric for a Pipecat and Twilio setup?

​Accessing Latency Metrics

​Handling Complex Scenarios with Transcript Data

​Implementation Workflow

​How do I set up the Cekura Slack integration to receive evaluation results and call alerts in a channel?

​Is screen recording or video/microphone use disabled in the Cekura dashboard?

​Does the company provide a Data Processing Agreement (DPA) and what is the status of its GDPR compliance?

​What are Cekura’s EU data residency options and standard contract terms?

​EU Data Residency

Frequently Asked Questions

How do credits work in Cekura, including pricing for testing, monitoring, and overages?

Understanding Credits and Pricing in Cekura

1. Credit Consumption Rates

2. Usage Examples

3. Overages and Manual Top-ups

4. BestPractices for Credit Optimization

Can I see logs of the webhooks Cekura sends to my endpoint?

Where can I upload knowledge base files on the Cekura dashboard and what is the recommended format for an FAQ-based document?

How to Upload Knowledge Base Files

Recommended Formats and Content

If I add a custom metric to a specific agent, will it automatically apply to all agents or can it be applied to a single agent only?

Agent vs. Project Metrics

How to Manage and Create Metrics

What is the best way to perform A/B testing to compare two different conversational agents?

Is there a way to export call data or share reports from the Calls page or dashboards?

1. CSV Export of Selected Calls

2. Dashboards with Daily Reports

3. Using the Cekura API

4. Custom Reporting Support

Is there a way to update or add metadata to data after it has been posted to the observability endpoint?

Best Practices for Metadata

When making an inbound call to the system for testing, do I need to manually trigger the conversation?

1. If your agent receives calls (Inbound Agent)

2. If your agent makes calls (Outbound Agent)

3. Automatic Triggering via Integrations

How can I request a copy of your SOC 2 report?

How can I obtain a Business Associate Agreement (BAA) for HIPAA compliance?

What legal documents does Cekura provide for enterprise due diligence or legal review?

How can I change the inbound number used to make calls to the main agent?

How can I integrate Cekura Cronjob success and failure notifications directly into a Datadog dashboard using webhooks?

Recommended Integration Workflow

How do I choose a subscription package for load testing, and how can I perform load testing for inbound and outbound calls?

Subscription Packages and Concurrency

How to Perform Load Testing (Inboundand Outbound)

Can I define custom tools or scripts to perform setup and cleanup tasks before and after a Cekura test for end-to-end integration testing?

Recommended Workflow for Integration Testing

Why are successful parallel outbound calls showing as timeouts in the dashboard?

Common Causes for Timeouts

Best Practices for Parallel Testing

How can I configure a custom domain and logo for my reports, and what should I do if I encounter an error during the domain setup?

How to Configure Custom Branding

Troubleshooting and Plan Restrictions

How can I perform end-to-end testing of a chatbot’s LLM flow via SMS to ensure all conversation types are handled correctly?

How to Set Up SMS Testing

Integration Options for Custom Backends

Key Benefits

How can I correlate incoming calls with the specific evaluation runs that triggered them in the Cekura API?

1. Randomized Phone Numbers (Immediate Correlation)

2. Custom Integration (Deterministic Correlation)

Note for Retell and Vapi Users

What does the response consistency metric test for?

How do I configure a Retell chat agent in Cekura and ensure that the chat tests appear on the dashboard?

1. Prepare your Retell Agent

2. Connect Retell to Cekura

3. Run and Monitor Tests

What is the best way to evaluate a scenario where a caregiver or family member picks up instead of the patient, and should I adjust agent settings or generate evals for this?

Step-by-Step Instructions

Why this approach?

Will incoming calls be restricted to a specific phone number, or can I still receive calls from multiple different numbers?

Does Cekura support a webhook for receiving call observability metrics and evaluation results to store in an external database for analytics?

How it Works

Key Benefits

Documentation

Which model does Cekura use to calculate the metric results?

How can I link an ElevenLabs account to view conversation IDs and tool call timestamps for evaluator test calls?

Why link your ElevenLabs account?

How should I format transcript data for Cekura’s observability API if my source only provides a single timestamp and not an end time?

Recommended Workaround

Important Note on Latency

Example Transformation

How can I add Time to First Audio (TTFA) as an infrastructure metric for a Pipecat and Twilio setup?

Accessing Latency Metrics

Handling Complex Scenarios with Transcript Data

Implementation Workflow

How do I set up the Cekura Slack integration to receive evaluation results and call alerts in a channel?

Is screen recording or video/microphone use disabled in the Cekura dashboard?

Does the company provide a Data Processing Agreement (DPA) and what is the status of its GDPR compliance?

What are Cekura’s EU data residency options and standard contract terms?

EU Data Residency