Custom Function Evaluation

Overview

Use the custom_function task type to run your own Python scoring logic on each evaluation item, giving you full control over scoring beyond the built-in metrics. The function source code is extracted, sent to the API, and executed server-side in a sandboxed environment.

The scale-gp-beta SDK provides a CustomFunction helper that handles source extraction and serialization for you.

SDK Helper

The CustomFunction class wraps a Python callable and provides methods to serialize it for the API and test it with a dry run.

from scale_gp_beta import SGPClient
from scale_gp_beta.lib import CustomFunction

client = SGPClient(account_id=..., api_key=...)

def string_similarity(expected: str, actual: str) -> float:
    import difflib
    return difflib.SequenceMatcher(None, expected, actual).ratio()

cf = CustomFunction(func=string_similarity)

Any imports your function needs must be placed inside the function body. Module-level imports are not captured when the source is extracted and will not be available at execution time.

Hide CustomFunction constructor parameters

func

Callable

required

The Python function to wrap. Must be a named function defined in a regular .py file — lambdas and built-in functions are not supported.

alias

string

Display name for the task result. Defaults to the function’s __name__.

arg_mapping

Dict[str, str]

Maps function parameter names to dataset column names. By default, parameter names are matched directly to column names. Values are automatically prefixed with item. unless they already start with item. (to support nested paths like item.nested.field).

Allowed Imports

Custom functions run in a restricted environment. Only these standard library modules are available: math, json, re, collections, itertools, functools, statistics, decimal, fractions, datetime, copy, textwrap, difflib, unicodedata

Return Type

Functions must return an int or float. Returning other types (including bool) will produce an error.

Using `arg_mapping`

By default, function parameter names are matched to dataset column names. Use arg_mapping when your function parameters don’t match the column names in your data:

def score(a: float, b: float) -> float:
    return a - b

cf = CustomFunction(
    func=score,
    alias="difference_score",
    arg_mapping={"a": "model_score", "b": "baseline_score"},
)

Values that already start with item. are passed through unchanged, which lets you reference nested fields:

cf = CustomFunction(
    func=score,
    arg_mapping={"a": "item.nested.model_score", "b": "baseline_score"},
)
# produces: {"a": "item.nested.model_score", "b": "item.baseline_score"}

Dry Run

Test your function against sample data before creating a full evaluation. The dry run endpoint is synchronous — it executes your function inline and returns per-row results immediately.

result = cf.dry_run(
    client=client,
    sample_data=[
        {"expected": "Hello world", "actual": "Hello wrold"},
        {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."},
    ],
)
print(result)

Discovering Columns

If you already have an evaluation and want to see which columns are available for arg_mapping:

from scale_gp_beta.lib import get_evaluation_columns

columns = get_evaluation_columns(client, "eval_abc123")
for col in columns:
    print(col.field_name, col.data_type)

Example Usage

The following illustrates a complete example creating an evaluation with a custom function task.

from scale_gp_beta import SGPClient
from scale_gp_beta.lib import CustomFunction

client = SGPClient(account_id=..., api_key=...)

def string_similarity(expected: str, actual: str) -> float:
    import difflib
    return difflib.SequenceMatcher(None, expected, actual).ratio()

cf_similarity = CustomFunction(func=string_similarity)

def length_ratio(expected: str, actual: str) -> float:
    if len(expected) == 0:
        return 0.0
    return len(actual) / len(expected)

cf_length = CustomFunction(func=length_ratio)

evaluation = client.evaluations.create(
    name="Custom Scoring Evaluation",
    data=[
        {"expected": "Hello world", "actual": "Hello wrold"},
        {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."},
    ],
    tasks=[cf_similarity.serialize(), cf_length.serialize()],
)

You can also create evaluations using the raw task dict directly, without the SDK helper:

evaluation = client.evaluations.create(
    name="Custom Scoring Evaluation",
    data=[
        {"expected": "Hello world", "actual": "Hello wrold"},
        {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."},
    ],
    tasks=[
        {
            "task_type": "custom_function",
            "alias": "string_similarity",
            "configuration": {
                "function_source": "def string_similarity(expected: str, actual: str) -> float:\n    import difflib\n    return difflib.SequenceMatcher(None, expected, actual).ratio()\n",
            }
        }
    ],
)

Poll for completion and inspect results:

import time

while True:
    evaluation = client.evaluations.retrieve(evaluation.id)
    if evaluation.status in ("completed", "failed"):
        break
    time.sleep(5)

items = client.evaluation_items.list(evaluation_id=evaluation.id)
for item in items:
    print(item.data)  # scores are stored under the task alias key, e.g. data["string_similarity"]
    if item.task_errors:
        print(item.task_errors)

Once the evaluation completes, you can also view the results — including per-item custom function scores — in the SGP UI.

Constraints

Constraint	Detail
Return type	`int` or `float` only (not `bool`)
Imports	Must be inside function body; only allowed stdlib modules
Function signature	No `args`, `*kwargs`, async, or lambdas
Dry run timeout	10s per row, 30s total

Hide configuration properties

function_source

string

required

The full source code of the Python function to execute. The CustomFunction helper extracts this automatically via inspect.getsource().

arg_mapping

object

A mapping from function parameter names to ItemLocator strings (e.g. {"param": "item.column_name"}). When omitted, parameter names are matched directly to column names.

Overview

Getting Started

Infrastructure

Evaluations

Evaluation Dashboards

Tracing

Agents

Custom Function Evaluation

Overview

SDK Helper

Allowed Imports

Return Type

Using `arg_mapping`

Dry Run

Discovering Columns

Example Usage

Constraints

​Overview

​SDK Helper

​Allowed Imports

​Return Type

​Using arg_mapping

​Dry Run

​Discovering Columns

​Example Usage

​Constraints

Overview

SDK Helper

Allowed Imports

Return Type

Using `arg_mapping`

Dry Run

Discovering Columns

Example Usage

Constraints