Use the custom_function task type to run your own Python scoring logic on each evaluation item, giving you full control over scoring beyond the built-in metrics. The function source code is extracted, sent to the API, and executed server-side in a sandboxed environment.
The scale-gp-beta SDK provides a CustomFunction helper that handles source extraction and serialization for you.
Any imports your function needs must be placed inside the function body. Module-level imports are not captured when the source is extracted and will not be available at execution time.
Maps function parameter names to dataset column names. By default, parameter names are matched directly to column names. Values are automatically prefixed with item. unless they already start with item. (to support nested paths like item.nested.field).
Custom functions run in a restricted environment. Only these standard library modules are available:math, json, re, collections, itertools, functools, statistics, decimal, fractions, datetime, copy, textwrap, difflib, unicodedata
By default, function parameter names are matched to dataset column names. Use arg_mapping when your function parameters don’t match the column names in your data:
Test your function against sample data before creating a full evaluation. The dry run endpoint is synchronous — it executes your function inline and returns per-row results immediately.
result = cf.dry_run( client=client, sample_data=[ {"expected": "Hello world", "actual": "Hello wrold"}, {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."}, ],)print(result)
If you already have an evaluation and want to see which columns are available for arg_mapping:
from scale_gp_beta.lib import get_evaluation_columnscolumns = get_evaluation_columns(client, "eval_abc123")for col in columns: print(col.field_name, col.data_type)
The following illustrates a complete example creating an evaluation with a custom function task.
from scale_gp_beta import SGPClientfrom scale_gp_beta.lib import CustomFunctionclient = SGPClient(account_id=..., api_key=...)def string_similarity(expected: str, actual: str) -> float: import difflib return difflib.SequenceMatcher(None, expected, actual).ratio()cf_similarity = CustomFunction(func=string_similarity)def length_ratio(expected: str, actual: str) -> float: if len(expected) == 0: return 0.0 return len(actual) / len(expected)cf_length = CustomFunction(func=length_ratio)evaluation = client.evaluations.create( name="Custom Scoring Evaluation", data=[ {"expected": "Hello world", "actual": "Hello wrold"}, {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."}, ], tasks=[cf_similarity.serialize(), cf_length.serialize()],)
You can also create evaluations using the raw task dict directly, without the SDK helper:
evaluation = client.evaluations.create( name="Custom Scoring Evaluation", data=[ {"expected": "Hello world", "actual": "Hello wrold"}, {"expected": "The quick brown fox jumps over the lazy dog.", "actual": "The quick brown fox jumped over the lazy dog."}, ], tasks=[ { "task_type": "custom_function", "alias": "string_similarity", "configuration": { "function_source": "def string_similarity(expected: str, actual: str) -> float:\n import difflib\n return difflib.SequenceMatcher(None, expected, actual).ratio()\n", } } ],)
Poll for completion and inspect results:
import timewhile True: evaluation = client.evaluations.retrieve(evaluation.id) if evaluation.status in ("completed", "failed"): break time.sleep(5)items = client.evaluation_items.list(evaluation_id=evaluation.id)for item in items: print(item.data) # scores are stored under the task alias key, e.g. data["string_similarity"] if item.task_errors: print(item.task_errors)
Once the evaluation completes, you can also view the results — including per-item custom function scores — in the SGP UI.
A mapping from function parameter names to ItemLocator strings (e.g. {"param": "item.column_name"}). When omitted, parameter names are matched directly to column names.