Skip to main content
Each SGP workspace has a Docker-compatible registry. You push images to it and reference them in job configs; Train takes care of getting the right image onto the right cloud backend.

Push an Image

1

Authenticate

Log in with your SGP credentials:
docker login docker-registry.your-sgp-deployment-url \
  --username $SGP_ACCOUNT_ID \
  --password $SGP_API_KEY
2

Build and tag

docker build -t my-training-job:v1 .
docker tag my-training-job:v1 docker-registry.your-sgp-deployment-url/my-training-job:v1
3

Push

docker push docker-registry.your-sgp-deployment-url/my-training-job:v1
4

Reference in your job config

Use the registry URL directly in your job config. Train resolves it automatically at submission:
"image_uri": "docker-registry.your-sgp-deployment-url/my-training-job:v1"
Train ensures the exact image you pushed is what runs, even if the tag is updated later.

Multi-Architecture Builds

Cloud training instances run linux/amd64. If you’re building on Apple Silicon, cross-compile explicitly:
docker buildx build \
  --platform linux/amd64 \
  -t docker-registry.your-sgp-deployment-url/my-training-job:v1 \
  --push .
Pushing a linux/arm64 image to an amd64 compute instance will cause the job to fail at startup with an exec format error.

Example Dockerfile

FROM pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY train.py .

CMD ["python3", "train.py"]
Your training script accesses data through STORAGE_MOUNT_*_PATH env vars, with no cloud-specific paths needed:
import os
import torch

data_dir = os.environ["STORAGE_MOUNT_0_PATH"]
output_dir = os.environ["STORAGE_MOUNT_1_PATH"]

# ... training loop ...
torch.save(model.state_dict(), f"{output_dir}/model.pt")

Next Steps