> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cekura.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Result

> Retrieve a test result by ID



## OpenAPI

````yaml get /test_framework/v1/results/{id}/
openapi: 3.1.0
info:
  title: Cekura API
  version: v1
  description: >-
    Complete API documentation for the Cekura platform. This API provides
    endpoints for testing, observing, and evaluating AI voice agents — including
    managing agents, running evaluators, defining metrics, and analyzing call
    quality.
servers:
  - url: https://api.cekura.ai
security: []
paths:
  /test_framework/v1/results/{id}/:
    get:
      tags:
        - test_framework
      summary: Retrieve a test result
      description: >-
        `success` indicates whether the run passed its quality evaluation. A run
        with `success=false` either failed to connect (check `error_message`) or
        failed an evaluation check (check `evaluation`). `success_rate` is the
        percentage of runs with `success=true`.


        The `runs` field is a dict keyed by run ID string (e.g. `{"86368":
        {...}, "86367": {...}}`), not an array — unlike the list endpoint. Each
        run also has a `call_id` STRING field (the provider's call identifier) —
        that is a data field, NOT an endpoint ID; do not pass it where `run_id`
        is expected.
      operationId: results-retrieve
      parameters:
        - in: path
          name: id
          schema:
            type: integer
          description: A unique integer value identifying this result.
          required: true
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ResultDetailV2'
              examples:
                Success:
                  value:
                    id: integer
                    name: string
                    agent: integer
                    status: string
                    met_expected_outcome_count: integer
                    total_expected_outcome_count: integer
                    success_rate: float
                    run_as_text: boolean
                    is_cronjob: boolean
                    runs:
                      run_id:
                        id: integer
                        scenario: integer
                        outbound_number: string
                        expected_outcome:
                          score: 100
                          explanation:
                            - >-
                              ✅ Positive outcome explanation with checkmark
                              emoji
                            - ❌ Negative outcome explanation with X emoji
                          outcome_alignments:
                            - outcome: string
                              prompt_part: string
                              aligned: boolean
                        success: boolean
                        evaluation:
                          metrics:
                            - id: integer
                              name: string
                              type: >-
                                binary_workflow_adherence | binary_qualitative |
                                continuous_qualitative | numeric | enum
                              score: number
                              explanation: string | array
                              function_name: string (optional)
                              extra:
                                categories:
                                  - category: string
                                    deviation: string (optional)
                                    priority: string (optional)
                                percentiles:
                                  p50: number
                              enum: string (for enum type metrics only)
                        timestamp: datetime
                        executed_at: datetime
                        error_message: string
                        status: string
                        duration: string (MM:SS format)
                        scenario_name: string
                        personality_name: string
                        metadata: object
                        inbound_number: string
                    overall_evaluation:
                      success_rate: number
                      metric_summary:
                        metric_id:
                          id: integer
                          name: string
                          type: string
                          score: number
                          explanation: string (optional)
                          function_name: string
                          vocera_defined_metric_code: string (optional)
                          p50: number (for numeric metrics)
                      worst_performing_metrics:
                        binary_adherence:
                          - array of metric_ids
                      numeric_metrics:
                        - name: string
                          type: numeric
                          value: number
                          percentiles:
                            p50: number
                      enum_metrics:
                        - array of metric_ids
                      extra_metrics:
                        - name: >-
                            string (e.g., 'Expected Outcome', 'Average Ringing
                            Duration')
                          type: string
                          value: number
                          percentiles:
                            p50: number (optional)
                    total_duration: string (MM:SS format)
                    total_runs_count: integer
                    completed_runs_count: integer
                    success_runs_count: integer
                    failed_runs_count: integer
                    scenarios:
                      - id: integer
                        name: string
                    critical_categories: array
                    metrics: array
                    domain: string (nullable)
                    domain_logo: string (nullable)
                    runs_by_tags: object
                    latency_data: object
                    failed_reasons: array
                    run_settings:
                      override_value: 10
                      frequency: 2
                      concurrency_limit: 5
                      personality_ids:
                        - 3
                        - 7
                      test_profile_ids:
                        - 20
                      mode: same_number
                      livekit_data:
                        agent_name: kit
                        config: {}
                        url: wss://example
                    created_at: datetime
                    updated_at: datetime
          description: ''
        '404':
          content:
            application/json:
              schema:
                type: object
                properties:
                  detail:
                    type: string
          description: ''
      security:
        - api_key: []
components:
  schemas:
    ResultDetailV2:
      type: object
      properties:
        id:
          type: integer
          readOnly: true
        name:
          type: string
          description: |

            Name of the result
            Example: `"Test Result 1"`
          maxLength: 255
        agent:
          type: integer
        agent_name:
          type: string
          readOnly: true
          description: Name of the agent associated with this result
        status:
          enum:
            - running
            - completed
            - failed
            - pending
            - in_progress
            - evaluating
            - in_queue
            - timeout
            - cancelled
            - scaling_up
          type: string
          x-spec-enum-id: 623e22fc1eb17833
          readOnly: true
          description: |-

            Current status of the result


            * `running` - Running
            * `completed` - Completed
            * `failed` - Failed
            * `pending` - Pending
            * `in_progress` - In Progress
            * `evaluating` - Evaluating
            * `in_queue` - In Queue
            * `timeout` - Timeout
            * `cancelled` - Cancelled
            * `scaling_up` - Scaling Up
        met_expected_outcome_count:
          type: string
          readOnly: true
          description: >-
            Number of runs that fully met their expected outcomes with a score
            of 100
        total_expected_outcome_count:
          type: string
          readOnly: true
          description: Total number of runs that had expected outcomes defined
        success_rate:
          type: number
          format: double
          readOnly: true
          description: |

            Success rate of the test runs
        run_as_text:
          type: boolean
          readOnly: true
          description: |

            Whether this test was run in text mode instead of voice mode
            Example: `true` or `false`
        is_cronjob:
          type: string
          readOnly: true
          description: Whether this result was created by a scheduled cronjob
        runs:
          type: object
          additionalProperties: {}
          description: >-
            Run objects keyed by run ID string (e.g. {"12345": {...}}). The list
            endpoint returns `runs` as an array of summaries instead.
          readOnly: true
        overall_evaluation:
          oneOf:
            - {}
            - type: 'null'
          readOnly: true
          description: |

            Overall evaluation of the test runs
            Example:
            ```json
            {
                "success_rate": "number",
                "metric_summary": {
                  "metric_id": {
                    "id": "integer",
                    "name": "string",
                    "type": "string",
                    "score": "number",
                    "explanation": "string (optional)",
                    "function_name": "string",
                    "vocera_defined_metric_code": "string (optional)",
                    "p50": "number (for numeric metrics)"
                  }
                },
                "worst_performing_metrics": {
                  "binary_adherence": [
                    "array of metric_ids"
                  ]
                },
                "numeric_metrics": [
                  {
                    "name": "string",
                    "type": "numeric",
                    "value": "number",
                    "percentiles": {
                      "p50": "number"
                    }
                  }
                ],
                "enum_metrics": [
                  "array of metric_ids"
                ],
                "extra_metrics": [
                  {
                    "name": "string (e.g., 'Expected Outcome', 'Average Ringing Duration')",
                    "type": "string",
                    "value": "number",
                    "percentiles": {
                      "p50": "number (optional)"
                    }
                  }
                ]
            }
        total_duration:
          type: string
          readOnly: true
          description: |

            Total duration of the test runs for this result
            Example: `22:30`
        total_runs_count:
          type: string
          readOnly: true
          description: |

            Total number of test runs associated with this result
            Example: `10`
        completed_runs_count:
          type: string
          readOnly: true
          description: |

            Number of test runs that have completed successfully
            Example: `10`
        success_runs_count:
          type: string
          readOnly: true
          description: |

            Number of test runs that were marked as successful
            Example: `10`
        failed_runs_count:
          type: string
          readOnly: true
          description: |

            Number of test runs that failed or encountered errors
            Example: `10`
        connected_runs:
          type: string
          readOnly: true
          description: >-
            List of run IDs that got connected successfully (have transcript
            data). Returns empty list if no connected runs.
        failed_infrastructure_runs:
          type: string
          readOnly: true
          description: >-
            List of run IDs that failed infrastructure issues metric (score=0).
            Returns null if metric not found, empty list if metric exists but no
            failures.
        failed_workflow_runs:
          type: string
          readOnly: true
          description: >-
            List of run IDs that failed expected outcome metric (score=0).
            Returns null if metric not found, empty list if metric exists but no
            failures.
        successful_calls:
          type: string
          readOnly: true
          description: >-
            List of run IDs that completed successfully. Returns empty list if
            no successful runs.
        scenarios:
          type: string
          readOnly: true
          description: |

            List of scenario names used in the test runs for this result
            Example: ```
            ```json
            [
                {
                    "id": 123,
                    "name": "Scenario 1"
                },
                {
                    "id": 456,
                    "name": "Scenario 2"
                }
            ]
            ```
        critical_categories:
          type: string
          readOnly: true
          description: |

            List of critical categories for this result
            Example:
            ```json
            [
                {
                  "id": 2950,
                  "name": "Pronunciation Analysis",
                  "eval_type": "continuous_qualitative",
                  "simulation_enabled": true,
                  "observability_enabled": true
                },
                {
                  "id": 3284,
                  "name": "Latency",
                  "eval_type": "numeric",
                  "simulation_enabled": true,
                  "observability_enabled": false
                },
                {
                  "id": 3295,
                  "name": "Detect Silence in Conversation",
                  "eval_type": "binary_qualitative",
                  "simulation_enabled": true,
                  "observability_enabled": true
                }
              ]
            ```
        metrics:
          type: string
          readOnly: true
        runs_by_tags:
          type: string
          readOnly: true
        latency_data:
          type: string
          readOnly: true
        failed_reasons:
          oneOf:
            - {}
            - type: 'null'
          readOnly: true
          description: |

            Failed reasons of the test runs
            Example:
            ```json
            {
                "issues": [
                  {
                    "rank": 1,
                    "run_ids": [
                      34588
                    ],
                    "description": "The agent did not provide the standard greeting, emergency disclaimer, or ask how they could help.",
                    "affected_count": 1
                  },
                  {
                    "rank": 2,
                    "run_ids": [
                      34588
                    ],
                    "description": "The agent did not explain the distinction between the primary care and express clinics.",
                    "affected_count": 1
                  }
                ],
                "total_failed_runs": 1
              }
            ```
        ai_summary:
          oneOf:
            - {}
            - type: 'null'
          readOnly: true
          description: >

            LLM-generated narrative summary of the result, structured for the
            Results-page UI.

            Example:

            ```json

            {
                "what_happened": "8 of 10 runs failed, 2 passed. All 8 failures cluster into 3 distinct causes.",
                "why_it_happened": "The default test profile isn't seeded in the production backend, so every auth-required workflow exits before PIN/address steps run.",
                "how_to_fix": "Seed the test profile in production, tighten the competitor-mention guardrail, and add CA-only store phrasing to the FAQ flow.",
                "generated_from_runs_count": 10,
                "generated_at": "2026-05-25T12:00:00Z"
            }

            ```
        next_steps:
          oneOf:
            - {}
            - type: 'null'
          readOnly: true
          description: >

            LLM-suggested next steps to fix the issues found in this result,
            ordered by impact.

            Example:

            ```json

            [
                {"title": "Seed test profile in production", "description": "Unblocks 6 runs; ~5 min."},
                {"title": "Tighten competitor guardrail", "description": "1 prompt edit; review drift run."}
            ]

            ```
        performance_metrics:
          type: string
          readOnly: true
          description: >-
            Per-rubric-rule performance breakdown. One entry per rule in the
            project's rubric_config: metric name, aggregated value across the
            result's runs, and whether that aggregate meets the rule's
            conditions.
        run_type:
          type:
            - string
            - 'null'
          readOnly: true
          description: Run Execution Type
        run_settings:
          oneOf:
            - {}
            - type: 'null'
          description: >-
            Snapshot of the override-relevant request payload (personality_ids,
            frequency, concurrency_limit, connection-specific overrides, etc.).
        created_at:
          type: string
          format: date-time
          description: |

            Timestamp when this test result was created
            Example: `2021-01-01 00:00:00`
        updated_at:
          type: string
          format: date-time
          readOnly: true
          description: |

            Timestamp when this test result was last updated
            Example: `2021-01-01 00:00:00`
      required:
        - agent
  securitySchemes:
    api_key:
      type: apiKey
      name: X-CEKURA-API-KEY
      in: header
      description: >-
        API Key Authentication. It should be included in the header of each
        request.

````