> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom Function Evaluation

> Define your own Python scoring functions and run them as evaluation tasks.

## Overview

Use the `custom_function` task type to run your own Python scoring logic on each evaluation item, giving you full control over scoring beyond the built-in metrics. The function source code is extracted, sent to the API, and executed server-side in a sandboxed environment.

<Info>The [`scale-gp-beta`](https://pypi.org/project/scale-gp-beta/) SDK provides a `CustomFunction` helper that handles source extraction and serialization for you.</Info>

## SDK Helper

The `CustomFunction` class wraps a Python callable and provides methods to serialize it for the API and test it with a dry run.

```python theme={null}
from scale_gp_beta import SGPClient
from scale_gp_beta.lib import CustomFunction

client = SGPClient(account_id=..., api_key=...)

def string_similarity(expected: str, actual: str) -> float:
    import difflib
    return difflib.SequenceMatcher(None, expected, actual).ratio()

cf = CustomFunction(func=string_similarity)
```

<Warning>Any imports your function needs must be placed **inside the function body**. Module-level imports are not captured when the source is extracted and will not be available at execution time.</Warning>

<Expandable title="CustomFunction constructor parameters" defaultOpen={true}>
  <ResponseField name="func" type="Callable" required>
    The Python function to wrap. Must be a named function defined in a regular `.py` file — lambdas and built-in functions are not supported.
  </ResponseField>

  <ResponseField name="alias" type="string">
    Display name for the task result. Defaults to the function's `__name__`.
  </ResponseField>

  <ResponseField name="arg_mapping" type="Dict[str, str]">
    Maps function parameter names to dataset column names. By default, parameter names are matched directly to column names. Values are automatically prefixed with `item.` unless they already start with `item.` (to support nested paths like `item.nested.field`).
  </ResponseField>
</Expandable>

### Allowed Imports

Custom functions run in a restricted environment. Only these standard library modules are available:

`math`, `json`, `re`, `collections`, `itertools`, `functools`, `statistics`, `decimal`, `fractions`, `datetime`, `copy`, `textwrap`, `difflib`, `unicodedata`

### Return Type

Functions must return an `int` or `float`. Returning other types (including `bool`) will produce an error.

## Using `arg_mapping`

By default, function parameter names are matched to dataset column names. Use `arg_mapping` when your function parameters don't match the column names in your data:

```python theme={null}
def score(a: float, b: float) -> float:
    return a - b

cf = CustomFunction(
    func=score,
    alias="difference_score",
    arg_mapping={"a": "model_score", "b": "baseline_score"},
)
```

Values that already start with `item.` are passed through unchanged, which lets you reference nested fields:

```python theme={null}
cf = CustomFunction(
    func=score,
    arg_mapping={"a": "item.nested.model_score", "b": "baseline_score"},
)
# produces: {"a": "item.nested.model_score", "b": "item.baseline_score"}
```

## Dry Run

Test your function against sample data before creating a full evaluation. The dry run endpoint is synchronous — it executes your function inline and returns per-row results immediately.

```python theme={null}
result = cf.dry_run(
    client=client,
    sample_data=[
        {"expected": "Hello world", "actual": "Hello wrold"},
        {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."},
    ],
)
print(result)
```

## Discovering Columns

If you already have an evaluation and want to see which columns are available for `arg_mapping`:

```python theme={null}
from scale_gp_beta.lib import get_evaluation_columns

columns = get_evaluation_columns(client, "eval_abc123")
for col in columns:
    print(col.field_name, col.data_type)
```

## Example Usage

The following illustrates a complete example creating an evaluation with a custom function task.

```python theme={null}
from scale_gp_beta import SGPClient
from scale_gp_beta.lib import CustomFunction

client = SGPClient(account_id=..., api_key=...)

def string_similarity(expected: str, actual: str) -> float:
    import difflib
    return difflib.SequenceMatcher(None, expected, actual).ratio()

cf_similarity = CustomFunction(func=string_similarity)

def length_ratio(expected: str, actual: str) -> float:
    if len(expected) == 0:
        return 0.0
    return len(actual) / len(expected)

cf_length = CustomFunction(func=length_ratio)

evaluation = client.evaluations.create(
    name="Custom Scoring Evaluation",
    data=[
        {"expected": "Hello world", "actual": "Hello wrold"},
        {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."},
    ],
    tasks=[cf_similarity.serialize(), cf_length.serialize()],
)
```

You can also create evaluations using the raw task dict directly, without the SDK helper:

```python theme={null}
evaluation = client.evaluations.create(
    name="Custom Scoring Evaluation",
    data=[
        {"expected": "Hello world", "actual": "Hello wrold"},
        {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."},
    ],
    tasks=[
        {
            "task_type": "custom_function",
            "alias": "string_similarity",
            "configuration": {
                "function_source": "def string_similarity(expected: str, actual: str) -> float:\n    import difflib\n    return difflib.SequenceMatcher(None, expected, actual).ratio()\n",
            }
        }
    ],
)
```

Poll for completion and inspect results:

```python theme={null}
import time

while True:
    evaluation = client.evaluations.retrieve(evaluation.id)
    if evaluation.status in ("completed", "failed"):
        break
    time.sleep(5)

items = client.evaluation_items.list(evaluation_id=evaluation.id)
for item in items:
    print(item.data)  # scores are stored under the task alias key, e.g. data["string_similarity"]
    if item.task_errors:
        print(item.task_errors)
```

Once the evaluation completes, you can also view the results — including per-item custom function scores — in the SGP UI.

## Constraints

| Constraint         | Detail                                                    |
| :----------------- | :-------------------------------------------------------- |
| Return type        | `int` or `float` only (not `bool`)                        |
| Imports            | Must be inside function body; only allowed stdlib modules |
| Function signature | No `*args`, `**kwargs`, async, or lambdas                 |
| Dry run timeout    | 10s per row, 30s total                                    |

<Expandable title="configuration properties" defaultOpen={true}>
  <ResponseField name="function_source" type="string" required>
    The full source code of the Python function to execute. The `CustomFunction` helper extracts this automatically via `inspect.getsource()`.
  </ResponseField>

  <ResponseField name="arg_mapping" type="object">
    A mapping from function parameter names to `ItemLocator` strings (e.g. `{"param": "item.column_name"}`). When omitted, parameter names are matched directly to column names.
  </ResponseField>
</Expandable>