External Applications
Integrate off-platform applications with SGP for evaluation
Getting Started
External applications provide a bridge between SGP and AI applications that exist off-platform. To get started, navigate to the “Applications” page on the SGP dashboard, click Create a new Application, and select External AI as the application template.
Integration
You can use the new SGP Python client to bridge your external AI to SGP. Doing this will allow you to run test cases from an Evaluation Dataset on your SGP account on your external AI and, subsequently, will enable you to run evaluations and generate a Scale Report.
Define an interface
In order to integrate your external AI with SGP, you’ll have to create an interface to run your application locally. This interface is a Python callable that takes a single prompt
and returns your external AI application’s formatted output.
For example, an integration with a LangChain model might look like this:
Custom Metrics
Optionally, you can include custom client-side metrics
to display with your outputs. These values will surface on the details page for any evaluation run you create against your uploaded outputs.
Advanced: Other Output Types
If your application uses RAG (or any other methodology) to generate additional context to inject into an LLM prompt, you can use one of our additional output types to include the context with the text response.
Context - String
You can include additional context as a single piece of text using ExternalApplicationOutputContextString
Context - Chunks
If your application retrieves a list of documents to use a additional context, you can use ExternalApplicationOutputContextChunks
to include each piece of text as a separate Chunk
Generate Outputs
Now that you’ve defined an interface for your external AI application, you can generate outputs for an evaluation dataset to use for a custom evaluation or to generate Scale Report. First, initialize an SGPClient
with your API key, and an ExternalApplication
with your external application variant ID and the interface you defined.
Then, find or create an evaluation dataset on the SGP dashboard, and copy the ID and latest version number to generate outputs for its test cases with your external application.
Uploading Precomputed Outputs
Optionally, you can bypass defining an ExternalApplication
interface if you’ve already generated outputs for your evaluation dataset outside of the SGP SDK by creating a mapping of test case IDs to outputs, and using the batch
upload function. This is what the ExternalApplication
library class uses under the hood.
You can also include context chunks used by your application for a given output:
Upload interactions to SGP
If you wish to log interactions from your external application to utilize the SGP features like evaluations and application monitoring, you can do so by using the interactions.create
method. If you have additional metadata emitted by the internal building blocks of your application (like a reranking
or a completion
component), you can attach them to the interaction as trace spans
:
The registered interactions will then show up on the Application Monitoring page that you can reach from the Applications
page by clicking on your External Application instance, and selecting Monitoring Dashboard
. From this page, you chan check out the details of your interactions, like the request latency, or the error rate. If you click on an interaction, you can also check out additional details about it, as well as the information emitted by the trace spans.
Creating threads for interactions
If your application supports multi-turn interactions, you might want to group certain interactions together, that are part of the same conversation. You can do this by assigning a UUID for the interaction, and then using the same UUID for any follow-up interaction.
This will group them together in the interaction details page in the following way: