- Evaluate applications with multiple inputs and outputs.
- Evaluate applications that have multiple steps, and enable users to evaluate each step of the application.
- Surface the exactly the right data to human evaluators, so you can increase evaluation velocity.