Get Result - Cekura

Retrieve a test result

curl --request GET \
  --url https://api.cekura.ai/test_framework/v1/results/{id}/ \
  --header 'X-CEKURA-API-KEY: <api-key>'

{
  "id": "integer",
  "name": "string",
  "agent": "integer",
  "status": "string",
  "met_expected_outcome_count": "integer",
  "total_expected_outcome_count": "integer",
  "success_rate": "float",
  "run_as_text": "boolean",
  "is_cronjob": "boolean",
  "runs": {
    "run_id": {
      "id": "integer",
      "scenario": "integer",
      "outbound_number": "string",
      "expected_outcome": {
        "score": 100,
        "explanation": [
          "✅ Positive outcome explanation with checkmark emoji",
          "❌ Negative outcome explanation with X emoji"
        ],
        "outcome_alignments": [
          {
            "outcome": "string",
            "prompt_part": "string",
            "aligned": "boolean"
          }
        ]
      },
      "success": "boolean",
      "evaluation": {
        "metrics": [
          {
            "id": "integer",
            "name": "string",
            "type": "binary_workflow_adherence | binary_qualitative | continuous_qualitative | numeric | enum",
            "score": "number",
            "explanation": "string | array",
            "function_name": "string (optional)",
            "extra": {
              "categories": [
                {
                  "category": "string",
                  "deviation": "string (optional)",
                  "priority": "string (optional)"
                }
              ],
              "percentiles": {
                "p50": "number"
              }
            },
            "enum": "string (for enum type metrics only)"
          }
        ]
      },
      "timestamp": "datetime",
      "executed_at": "datetime",
      "error_message": "string",
      "status": "string",
      "duration": "string (MM:SS format)",
      "scenario_name": "string",
      "personality_name": "string",
      "metadata": "object",
      "inbound_number": "string"
    }
  },
  "overall_evaluation": {
    "success_rate": "number",
    "metric_summary": {
      "metric_id": {
        "id": "integer",
        "name": "string",
        "type": "string",
        "score": "number",
        "explanation": "string (optional)",
        "function_name": "string",
        "vocera_defined_metric_code": "string (optional)",
        "p50": "number (for numeric metrics)"
      }
    },
    "worst_performing_metrics": {
      "binary_adherence": [
        "array of metric_ids"
      ]
    },
    "numeric_metrics": [
      {
        "name": "string",
        "type": "numeric",
        "value": "number",
        "percentiles": {
          "p50": "number"
        }
      }
    ],
    "enum_metrics": [
      "array of metric_ids"
    ],
    "extra_metrics": [
      {
        "name": "string (e.g., 'Expected Outcome', 'Average Ringing Duration')",
        "type": "string",
        "value": "number",
        "percentiles": {
          "p50": "number (optional)"
        }
      }
    ]
  },
  "total_duration": "string (MM:SS format)",
  "total_runs_count": "integer",
  "completed_runs_count": "integer",
  "success_runs_count": "integer",
  "failed_runs_count": "integer",
  "scenarios": [
    {
      "id": "integer",
      "name": "string"
    }
  ],
  "critical_categories": "array",
  "metrics": "array",
  "domain": "string (nullable)",
  "domain_logo": "string (nullable)",
  "runs_by_tags": "object",
  "latency_data": "object",
  "failed_reasons": "array",
  "run_settings": {
    "override_value": 10,
    "frequency": 2,
    "concurrency_limit": 5,
    "personality_ids": [
      3,
      7
    ],
    "test_profile_ids": [
      20
    ],
    "mode": "same_number",
    "livekit_data": {
      "agent_name": "kit",
      "config": {},
      "url": "wss://example"
    }
  },
  "created_at": "datetime",
  "updated_at": "datetime"
}

GET

test_framework

results

{id}

Retrieve a test result

curl --request GET \
  --url https://api.cekura.ai/test_framework/v1/results/{id}/ \
  --header 'X-CEKURA-API-KEY: <api-key>'

{
  "id": "integer",
  "name": "string",
  "agent": "integer",
  "status": "string",
  "met_expected_outcome_count": "integer",
  "total_expected_outcome_count": "integer",
  "success_rate": "float",
  "run_as_text": "boolean",
  "is_cronjob": "boolean",
  "runs": {
    "run_id": {
      "id": "integer",
      "scenario": "integer",
      "outbound_number": "string",
      "expected_outcome": {
        "score": 100,
        "explanation": [
          "✅ Positive outcome explanation with checkmark emoji",
          "❌ Negative outcome explanation with X emoji"
        ],
        "outcome_alignments": [
          {
            "outcome": "string",
            "prompt_part": "string",
            "aligned": "boolean"
          }
        ]
      },
      "success": "boolean",
      "evaluation": {
        "metrics": [
          {
            "id": "integer",
            "name": "string",
            "type": "binary_workflow_adherence | binary_qualitative | continuous_qualitative | numeric | enum",
            "score": "number",
            "explanation": "string | array",
            "function_name": "string (optional)",
            "extra": {
              "categories": [
                {
                  "category": "string",
                  "deviation": "string (optional)",
                  "priority": "string (optional)"
                }
              ],
              "percentiles": {
                "p50": "number"
              }
            },
            "enum": "string (for enum type metrics only)"
          }
        ]
      },
      "timestamp": "datetime",
      "executed_at": "datetime",
      "error_message": "string",
      "status": "string",
      "duration": "string (MM:SS format)",
      "scenario_name": "string",
      "personality_name": "string",
      "metadata": "object",
      "inbound_number": "string"
    }
  },
  "overall_evaluation": {
    "success_rate": "number",
    "metric_summary": {
      "metric_id": {
        "id": "integer",
        "name": "string",
        "type": "string",
        "score": "number",
        "explanation": "string (optional)",
        "function_name": "string",
        "vocera_defined_metric_code": "string (optional)",
        "p50": "number (for numeric metrics)"
      }
    },
    "worst_performing_metrics": {
      "binary_adherence": [
        "array of metric_ids"
      ]
    },
    "numeric_metrics": [
      {
        "name": "string",
        "type": "numeric",
        "value": "number",
        "percentiles": {
          "p50": "number"
        }
      }
    ],
    "enum_metrics": [
      "array of metric_ids"
    ],
    "extra_metrics": [
      {
        "name": "string (e.g., 'Expected Outcome', 'Average Ringing Duration')",
        "type": "string",
        "value": "number",
        "percentiles": {
          "p50": "number (optional)"
        }
      }
    ]
  },
  "total_duration": "string (MM:SS format)",
  "total_runs_count": "integer",
  "completed_runs_count": "integer",
  "success_runs_count": "integer",
  "failed_runs_count": "integer",
  "scenarios": [
    {
      "id": "integer",
      "name": "string"
    }
  ],
  "critical_categories": "array",
  "metrics": "array",
  "domain": "string (nullable)",
  "domain_logo": "string (nullable)",
  "runs_by_tags": "object",
  "latency_data": "object",
  "failed_reasons": "array",
  "run_settings": {
    "override_value": 10,
    "frequency": 2,
    "concurrency_limit": 5,
    "personality_ids": [
      3,
      7
    ],
    "test_profile_ids": [
      20
    ],
    "mode": "same_number",
    "livekit_data": {
      "agent_name": "kit",
      "config": {},
      "url": "wss://example"
    }
  },
  "created_at": "datetime",
  "updated_at": "datetime"
}

Authorizations

X-CEKURA-API-KEY

string

header

required

API Key Authentication. It should be included in the header of each request.

Path Parameters

integer

required

A unique integer value identifying this result.

Response

agent

integer

required

integer

read-only

name

string

Name of the result Example: "Test Result 1"

Maximum string length: 255

agent_name

string

read-only

Name of the agent associated with this result

status

enum<string>

read-only

Current status of the result

running - Running
completed - Completed
failed - Failed
pending - Pending
in_progress - In Progress
evaluating - Evaluating
in_queue - In Queue
timeout - Timeout
cancelled - Cancelled
scaling_up - Scaling Up

Available options:

running,

completed,

failed,

pending,

in_progress,

evaluating,

in_queue,

timeout,

cancelled,

scaling_up

met_expected_outcome_count

string

read-only

Number of runs that fully met their expected outcomes with a score of 100

total_expected_outcome_count

string

read-only

Total number of runs that had expected outcomes defined

success_rate

number<double>

read-only

Success rate of the test runs

run_as_text

boolean

read-only

Whether this test was run in text mode instead of voice mode Example: true or false

is_cronjob

string

read-only

Whether this result was created by a scheduled cronjob

runs

object

read-only

Run objects keyed by run ID string (e.g. {"12345": {...}}). The list endpoint returns runs as an array of summaries instead.

Show child attributes

overall_evaluation

any | null

Overall evaluation of the test runs Example:

{
    "success_rate": "number",
    "metric_summary": {
      "metric_id": {
        "id": "integer",
        "name": "string",
        "type": "string",
        "score": "number",
        "explanation": "string (optional)",
        "function_name": "string",
        "vocera_defined_metric_code": "string (optional)",
        "p50": "number (for numeric metrics)"
      }
    },
    "worst_performing_metrics": {
      "binary_adherence": [
        "array of metric_ids"
      ]
    },
    "numeric_metrics": [
      {
        "name": "string",
        "type": "numeric",
        "value": "number",
        "percentiles": {
          "p50": "number"
        }
      }
    ],
    "enum_metrics": [
      "array of metric_ids"
    ],
    "extra_metrics": [
      {
        "name": "string (e.g., 'Expected Outcome', 'Average Ringing Duration')",
        "type": "string",
        "value": "number",
        "percentiles": {
          "p50": "number (optional)"
        }
      }
    ]
}

total_duration

string

read-only

Total duration of the test runs for this result Example: 22:30

total_runs_count

string

read-only

Total number of test runs associated with this result Example: 10

completed_runs_count

string

read-only

Number of test runs that have completed successfully Example: 10

success_runs_count

string

read-only

Number of test runs that were marked as successful Example: 10

failed_runs_count

string

read-only

Number of test runs that failed or encountered errors Example: 10

connected_runs

string

read-only

List of run IDs that got connected successfully (have transcript data). Returns empty list if no connected runs.

failed_infrastructure_runs

string

read-only

List of run IDs that failed infrastructure issues metric (score=0). Returns null if metric not found, empty list if metric exists but no failures.

failed_workflow_runs

string

read-only

List of run IDs that failed expected outcome metric (score=0). Returns null if metric not found, empty list if metric exists but no failures.

successful_calls

string

read-only

List of run IDs that completed successfully. Returns empty list if no successful runs.

scenarios

string

read-only

List of scenario names used in the test runs for this result Example: ```

[
    {
        "id": 123,
        "name": "Scenario 1"
    },
    {
        "id": 456,
        "name": "Scenario 2"
    }
]

critical_categories

string

read-only

List of critical categories for this result Example:

[
    {
      "id": 2950,
      "name": "Pronunciation Analysis",
      "eval_type": "continuous_qualitative",
      "simulation_enabled": true,
      "observability_enabled": true
    },
    {
      "id": 3284,
      "name": "Latency",
      "eval_type": "numeric",
      "simulation_enabled": true,
      "observability_enabled": false
    },
    {
      "id": 3295,
      "name": "Detect Silence in Conversation",
      "eval_type": "binary_qualitative",
      "simulation_enabled": true,
      "observability_enabled": true
    }
  ]

metrics

string

read-only

runs_by_tags

string

read-only

latency_data

string

read-only

failed_reasons

any | null

Failed reasons of the test runs Example:

{
    "issues": [
      {
        "rank": 1,
        "run_ids": [
          34588
        ],
        "description": "The agent did not provide the standard greeting, emergency disclaimer, or ask how they could help.",
        "affected_count": 1
      },
      {
        "rank": 2,
        "run_ids": [
          34588
        ],
        "description": "The agent did not explain the distinction between the primary care and express clinics.",
        "affected_count": 1
      }
    ],
    "total_failed_runs": 1
  }

ai_summary

any | null

LLM-generated narrative summary of the result, structured for the Results-page UI. Example:

{
    "what_happened": "8 of 10 runs failed, 2 passed. All 8 failures cluster into 3 distinct causes.",
    "why_it_happened": "The default test profile isn't seeded in the production backend, so every auth-required workflow exits before PIN/address steps run.",
    "how_to_fix": "Seed the test profile in production, tighten the competitor-mention guardrail, and add CA-only store phrasing to the FAQ flow.",
    "generated_from_runs_count": 10,
    "generated_at": "2026-05-25T12:00:00Z"
}

next_steps

any | null

LLM-suggested next steps to fix the issues found in this result, ordered by impact. Example:

[
    {"title": "Seed test profile in production", "description": "Unblocks 6 runs; ~5 min."},
    {"title": "Tighten competitor guardrail", "description": "1 prompt edit; review drift run."}
]

performance_metrics

string

read-only

Per-rubric-rule performance breakdown. One entry per rule in the project's rubric_config: metric name, aggregated value across the result's runs, and whether that aggregate meets the rule's conditions.

run_type

string | null

read-only

Run Execution Type

run_settings

any | null

Snapshot of the override-relevant request payload (personality_ids, frequency, concurrency_limit, connection-specific overrides, etc.).

created_at

string<date-time>

Timestamp when this test result was created Example: 2021-01-01 00:00:00

updated_at

string<date-time>

read-only

Timestamp when this test result was last updated Example: 2021-01-01 00:00:00

Delete Test Profile List Results

⌘I