Skip to main content

Overview

Voice evaluations allow you to assess audio content through human annotation tasks. This guide walks you through the complete process of uploading audio files and creating evaluations with timestamp-based questions, such as identifying when mispronunciations occur.
Examples in this guide use the scale-gp-beta package which runs exclusively on the V5 API.

Step 1: Upload Audio Files

Before creating a voice evaluation, you need to upload your audio files to the platform. There are two methods available:

Method 1: Direct File Upload

Upload files directly from your local filesystem:
from scale_gp_beta import SGPClient

client = SGPClient(account_id=..., api_key=...)

# Upload a single audio file
audio_file_1 = client.files.create(
  file=open("audio_sample_1.mp3", "rb"),
  purpose="evaluation"
)

audio_file_2 = client.files.create(
  file=open("audio_sample_2.mp3", "rb"),
  purpose="evaluation"
)

# Store the file IDs for later use
file_id_1 = audio_file_1.id
file_id_2 = audio_file_2.id
See the Upload File API reference for more details.

Method 2: Import from Cloud Storage

If your files are already stored in cloud storage (S3, GCS, etc.), you can import them in batch:
from scale_gp_beta import SGPClient

client = SGPClient(account_id=..., api_key=...)

# Import files from cloud storage
import_result = client.files.cloud_imports.create(
  files=[
    {
      "container": "my-s3-bucket",
      "filename": "audio_sample_1.mp3",
      "filepath": "evaluations/audio/audio_sample_1.mp3",
      "file_type": "audio/mpeg"
    },
    {
      "container": "my-s3-bucket",
      "filename": "audio_sample_2.mp3",
      "filepath": "evaluations/audio/audio_sample_2.mp3",
      "file_type": "audio/mpeg"
    }
  ]
)

# Extract file IDs from the import results
file_id_1 = import_result.results[0].file.id
file_id_2 = import_result.results[1].file.id
See the Import Files API reference for more details.

Step 2: Create a Question

Create a timestamp-based question that annotators will use to mark specific points in the audio:
# Create a timestamp question for identifying mispronunciations
question = client.questions.create(
  type="timestamp",
  title="Mispronunciation",
  prompt="When did mispronunciation occur?",
  multi=True  # Set to False if only one timestamp is expected
)

# Store the question ID
question_id = question.id
Set multi=True if you want annotators to be able to mark multiple timestamps in a single audio file. For voice evaluations, common question types include:
  • Mispronunciations: When did the speaker mispronounce words?
  • Disfluencies: When did stuttering or filler words occur?
  • Emotion changes: When did the speaker’s tone shift?

Step 3: Create the Voice Evaluation

Now combine the uploaded files and question into a complete evaluation:
Evaluations and datasets that contain files must be created via the API or SDK. This functionality is not currently supported in the UI.
# Create the voice evaluation
voice_evaluation = client.evaluations.create(
  name="Voice Quality Evaluation - Mispronunciation Detection",
  description="Evaluate audio samples for mispronunciations",
  data=[
    { "audio_description": "Customer service call 1" },
    { "audio_description": "Customer service call 2" },
  ],
  files=[
    { "audio_file": file_id_1 },
    { "audio_file": file_id_2 },
  ],
  tasks=[
    {
      "task_type": "contributor_evaluation.question",
      "alias": "mispronunciation_timestamps",
      "configuration": {
        "question_id": question_id,
        "queue_id": "default",
        "layout": {
          "direction": "column",
          "children": [
            {
              "data": "item.audio_description"
            },
            {
              "data": "item.files.audio_file"
            }
          ]
        },
        "required": False
      }
    }
  ]
)

print(f"Evaluation created with ID: {voice_evaluation.id}")

Understanding the Structure

  • data: Each object represents metadata for one audio sample. The items in data have a one-to-one mapping with items in files.
  • files: Each object contains file IDs that correspond to the uploaded audio files. The first item in files pairs with the first item in data to form the first evaluation row.
  • tasks: Defines what annotators will do. Here, we use a contributor_evaluation.question task with a timestamp question.
See the Create Evaluation API reference for more details.

Next Steps

After creating your voice evaluation:
  1. Monitor Progress: Check the evaluation status to see when annotation tasks are completed
  2. Annotate: Contributors will annotate and answer the questions and tasks you have created
  3. Review Results: Access the annotated timestamps through the evaluation items
  4. Analyze Data: Export results to analyze patterns in voice quality
For more information on working with evaluations, see the Next Gen Evaluation Overview.

Example Annotation Flow

When contributors access the evaluation queue, they’ll see a clean interface displaying the audio player along with any metadata you’ve included (such as call IDs or descriptions). The audio player allows them to play, pause, and scrub through the audio file. As they listen, they can use the timestamp controls to mark specific moments when events occur (e.g., mispronunciations). If multi=True was set, they can add multiple timestamps for each audio file. Once satisfied with their annotations, they submit their responses and move on to the next item in the queue. annotator-view-audio-file