Generate Dataset through the SDK

SDK instructions for uploading datasets can be found here.

Generate Dataset through the UI

To create a new safety dataset from the platform, navigate to Evaluation Datasets. From there, select Create Dataset and choose the Using Knowledge Base option.

Configure Options

In order to generate a dataset, you will need the following:

  • Knowledge Base: This is the source of data for your evaluation dataset. It includes any documents, csv uploads, or data sources to pull the information from.
  • Evaluation Dataset Name: This is the name for the dataset so that you can easily find it after creation to perform evaluations.
  • Advanced Settings (Optional): This allows you to select how many test cases you want and how you want to group the test cases.
    • Number of Test Cases: # of test cases generated
    • Group Test Cases by Artifact: If you select this option, we will only create test cases with Knowledge pulled from a single artifact. (Ex: If you have several documents in your knowledge base, we will not create any test cases that take data from more than one document)

Dataset Generation

After you’ve configured the dataset generation, you may have to wait a while for the datasets to be generated. This depends on the size of your Knowledge Base and how many test cases you selected to be generated.

Approve Test Cases and Publish Datasets

Approve and Edit Test Cases

After Dataset is generated, you can select which datasets to approve. You can also directly edit the content of the dataset through the UI. If you want to see what data was used to generate the Input and Expected Output, you can click the View Chunks button.

Once you approve a test case, you can no longer edit or undo the approval.

If you want to generate more test cases as part of this dataset, you can select the Generate More button to generate additional test cases for this dataset.

Publish Dataset

Once you are satisfied with the test cases in this dataset, click Publish to publish your dataset.