Get Result
Retrieve a test result by ID
Authorizations
API Key Authentication. It should be included in the header of each request.
Path Parameters
A unique integer value identifying this result.
Response
Name of the result
Example: "Test Result 1"
255Name of the agent associated with this result
Current status of the result
running- Runningcompleted- Completedfailed- Failedpending- Pendingin_progress- In Progressevaluating- Evaluatingin_queue- In Queuetimeout- Timeoutcancelled- Cancelledscaling_up- Scaling Up
running, completed, failed, pending, in_progress, evaluating, in_queue, timeout, cancelled, scaling_up Number of runs that fully met their expected outcomes with a score of 100
Total number of runs that had expected outcomes defined
Success rate of the test runs
Whether this test was run in text mode instead of voice mode
Example: true or false
Whether this result was created by a scheduled cronjob
Run objects keyed by run ID string (e.g. {"12345": {...}}). The list endpoint returns runs as an array of summaries instead.
Overall evaluation of the test runs Example:
{
"success_rate": "number",
"metric_summary": {
"metric_id": {
"id": "integer",
"name": "string",
"type": "string",
"score": "number",
"explanation": "string (optional)",
"function_name": "string",
"vocera_defined_metric_code": "string (optional)",
"p50": "number (for numeric metrics)"
}
},
"worst_performing_metrics": {
"binary_adherence": [
"array of metric_ids"
]
},
"numeric_metrics": [
{
"name": "string",
"type": "numeric",
"value": "number",
"percentiles": {
"p50": "number"
}
}
],
"enum_metrics": [
"array of metric_ids"
],
"extra_metrics": [
{
"name": "string (e.g., 'Expected Outcome', 'Average Ringing Duration')",
"type": "string",
"value": "number",
"percentiles": {
"p50": "number (optional)"
}
}
]
}Total duration of the test runs for this result
Example: 22:30
Total number of test runs associated with this result
Example: 10
Number of test runs that have completed successfully
Example: 10
Number of test runs that were marked as successful
Example: 10
Number of test runs that failed or encountered errors
Example: 10
List of run IDs that got connected successfully (have transcript data). Returns empty list if no connected runs.
List of run IDs that failed infrastructure issues metric (score=0). Returns null if metric not found, empty list if metric exists but no failures.
List of run IDs that failed expected outcome metric (score=0). Returns null if metric not found, empty list if metric exists but no failures.
List of run IDs that completed successfully. Returns empty list if no successful runs.
List of scenario names used in the test runs for this result Example: ```
[
{
"id": 123,
"name": "Scenario 1"
},
{
"id": 456,
"name": "Scenario 2"
}
]List of critical categories for this result Example:
[
{
"id": 2950,
"name": "Pronunciation Analysis",
"eval_type": "continuous_qualitative",
"simulation_enabled": true,
"observability_enabled": true
},
{
"id": 3284,
"name": "Latency",
"eval_type": "numeric",
"simulation_enabled": true,
"observability_enabled": false
},
{
"id": 3295,
"name": "Detect Silence in Conversation",
"eval_type": "binary_qualitative",
"simulation_enabled": true,
"observability_enabled": true
}
]Failed reasons of the test runs Example:
{
"issues": [
{
"rank": 1,
"run_ids": [
34588
],
"description": "The agent did not provide the standard greeting, emergency disclaimer, or ask how they could help.",
"affected_count": 1
},
{
"rank": 2,
"run_ids": [
34588
],
"description": "The agent did not explain the distinction between the primary care and express clinics.",
"affected_count": 1
}
],
"total_failed_runs": 1
}LLM-generated narrative summary of the result, structured for the Results-page UI. Example:
{
"what_happened": "8 of 10 runs failed, 2 passed. All 8 failures cluster into 3 distinct causes.",
"why_it_happened": "The default test profile isn't seeded in the production backend, so every auth-required workflow exits before PIN/address steps run.",
"how_to_fix": "Seed the test profile in production, tighten the competitor-mention guardrail, and add CA-only store phrasing to the FAQ flow.",
"generated_from_runs_count": 10,
"generated_at": "2026-05-25T12:00:00Z"
}LLM-suggested next steps to fix the issues found in this result, ordered by impact. Example:
[
{"title": "Seed test profile in production", "description": "Unblocks 6 runs; ~5 min."},
{"title": "Tighten competitor guardrail", "description": "1 prompt edit; review drift run."}
]Per-rubric-rule performance breakdown. One entry per rule in the project's rubric_config: metric name, aggregated value across the result's runs, and whether that aggregate meets the rule's conditions.
Run Execution Type
Snapshot of the override-relevant request payload (personality_ids, frequency, concurrency_limit, connection-specific overrides, etc.).
Timestamp when this test result was created
Example: 2021-01-01 00:00:00
Timestamp when this test result was last updated
Example: 2021-01-01 00:00:00