Skip to main content

Advanced Features of Metric Lab

Metric Lab provides powerful tools for defining, testing, and optimizing custom metrics for your AI assistant. This guide walks you through the process of refining metrics to ensure they accurately reflect real-world performance.

Metric Optimization Workflow

Metric Lab provides a powerful workflow for defining, testing, and automatically optimizing custom metrics to ensure they accurately reflect your AI assistant’s performance.

Defining Custom Metrics

First, define a custom metric tailored to your specific business logic. In this example, we will create a metric to verify appointment bookings.
  • Metric Name: Appointment Booked
  • Definition: Assesses whether the main agent successfully booked an appointment for the testing agent.
  • Success Criteria: An appointment was successfully booked.
  • Failure Criteria: No appointment was booked, or the process was incomplete.
Defining Custom Metric

Identifying Metric Performance Issues

After your agent runs, you might notice discrepancies where the metric result doesn’t match reality.
  • False Positive: The metric says “Appointment Booked” (Success), but the agent actually failed to confirm the time.
  • False Negative: The metric says “Not Booked” (Failure), even though the agent successfully completed the booking.
These inconsistencies are signals that your metric definition (prompt or code) needs refinement.

Add to Labs

To fix these issues, you need to create a test set from these problematic calls.
  • Locate the specific calls in your logs where the metric failed.
  • Select them and click Add to Lab.
  • Assign them to the “Appointment Booked” metric lab.
Add to Labs

Initial Run

Once inside the Lab, run your current metric against the newly added test set to establish a baseline.
  • Click the Run button in the Metric Lab.
  • The system will evaluate the “Appointment Booked” metric against all the calls in your test set.
  • Review the Overall Score (e.g., 3/5) to see the current accuracy.
Initial Run

Annotate

This is the most critical step. You must tell the system what the correct result should have been for each failed call.
  • Scroll down to the Table View.
  • Look for rows where Actual Value (what the metric thought) differs from what really happened.
  • Update Expected Value: Change the expected status to the correct one (e.g., change “False” to “True” if it was actually booked).
  • Add Notes: Click the feedback/notes icon and explain why the metric was wrong (e.g., “The user implicitly confirmed the time by saying ‘Sounds good’, so this should count as a booking”).
Annotate and Add Notes

Auto Improve

Instead of rewriting the prompt manually, let the Metric Lab’s optimizer do the work for you.
  • Ensure you have annotated the mismatches and added notes.
  • Click the Auto Improve button at the top right.
  • The system will analyze the transcripts, your notes, and the expected outcomes to generate a better metric definition.
Auto Improve Button

View, Analyze and Save

Once the Auto Improve task is complete (indicated by a green checkmark in the progress panel), you can review what the system proposes. Verify that the changes make sense and align with your meaningful definition of the metric.
  • Click View Changes on the completed task.
  • Diff View: Inspect the highlighted changes in the Description or Prompt fields. You might see that the system added specific instructions like “Consider ‘Sounds good’ as a valid confirmation.”
  • Review Table: Scan the table to ensure the new logic fixes the previous errors without breaking correct rows.
  • Click Save to apply the optimized metric definition.
Analyze Diff and Save

Re-run and Observe Improvement

Finally, verify that your new metric is robust.
  • The metric is now updated with the optimized logic.
  • Future calls will be evaluated using this smarter definition.
  • You can manually Run the review again to confirm the score remains high (e.g., 5/5).

Benefits of Metric Optimization

This iterative optimization process allows you to:
  • Improve metric accuracy from as low as 50% to 95% or higher
  • Ensure the labels you see from your AI assistant accurately reflect real performance
  • Make data-driven decisions based on reliable metrics