Each SGP workspace has a Docker-compatible registry. You push images to it and reference them in job configs; Train takes care of getting the right image onto the right cloud backend.
Push an Image
Authenticate
Log in with your SGP credentials:docker login docker-registry.your-sgp-deployment-url \
--username $SGP_ACCOUNT_ID \
--password $SGP_API_KEY
Build and tag
docker build -t my-training-job:v1 .
docker tag my-training-job:v1 docker-registry.your-sgp-deployment-url/my-training-job:v1
Push
docker push docker-registry.your-sgp-deployment-url/my-training-job:v1
Reference in your job config
Use the registry URL directly in your job config. Train resolves it automatically at submission:"image_uri": "docker-registry.your-sgp-deployment-url/my-training-job:v1"
Train ensures the exact image you pushed is what runs, even if the tag is updated later.
Multi-Architecture Builds
Cloud training instances run linux/amd64. If you’re building on Apple Silicon, cross-compile explicitly:
docker buildx build \
--platform linux/amd64 \
-t docker-registry.your-sgp-deployment-url/my-training-job:v1 \
--push .
Pushing a linux/arm64 image to an amd64 compute instance will cause the job to fail at startup with an exec format error.
Example Dockerfile
FROM pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY train.py .
CMD ["python3", "train.py"]
Your training script accesses data through STORAGE_MOUNT_*_PATH env vars, with no cloud-specific paths needed:
import os
import torch
data_dir = os.environ["STORAGE_MOUNT_0_PATH"]
output_dir = os.environ["STORAGE_MOUNT_1_PATH"]
# ... training loop ...
torch.save(model.state_dict(), f"{output_dir}/model.pt")
Next Steps