Storage Mounts

Train mounts cloud storage into your container before the job starts and sets STORAGE_MOUNT_0_PATH, STORAGE_MOUNT_1_PATH, etc. as environment variables. Your training script reads from these paths, with no cloud-specific SDKs or storage code needed.

Mount Configuration

Each entry in storage_mounts maps a cloud URI to a container path:

"storage_mounts": [
  {
    "source_uri": "gs://my-bucket/training-data",
    "mount_path": "/mnt/data",
    "read_only": true
  },
  {
    "source_uri": "gs://my-bucket/output/run-1",
    "mount_path": "/mnt/output",
    "read_only": false
  }
]

Inside the container:

STORAGE_MOUNT_0_PATH=/mnt/data
STORAGE_MOUNT_1_PATH=/mnt/output

Mounts are indexed in order, starting at zero.

URI Formats

Cloud	Format	Example
GCP	`gs://bucket/path`	`gs://my-bucket/datasets/imagenet`
AWS	`s3://bucket/path`	`s3://my-bucket/datasets/imagenet`
Azure	`azureml://datastores/name/paths/path`	`azureml://datastores/training/paths/imagenet`

Azure storage mounts reference registered Azure ML datastores, not raw Blob Storage URLs. Datastores are configured in your Azure ML workspace.

Accessing Mounts in Your Container

import os
import torch
from datasets import load_dataset

data_path = os.environ["STORAGE_MOUNT_0_PATH"]
output_path = os.environ["STORAGE_MOUNT_1_PATH"]

dataset = load_dataset(data_path)

# ... training loop ...
torch.save(model.state_dict(), f"{output_path}/checkpoint-epoch-{epoch}.pt")

This works identically on GCP, AWS, and Azure.

Read vs. Write

Set read_only based on intent. Use true for input data and pretrained weights; false for outputs, checkpoints, and logs. Marking input mounts read-only prevents accidental writes and may improve performance on some backends.

SageMaker: Input Data Config

SageMaker also supports input_data_config for S3 input channels. Unlike FUSE mounts, channels are downloaded to the instance before training starts, which can be faster for large datasets with random access patterns:

"input_data_config": [{
  "channel_name": "train",
  "data_source": {
    "s3_data_source": {
      "s3_data_type": "S3Prefix",
      "s3_uri": "s3://my-bucket/datasets/train/",
      "s3_data_distribution_type": "FullyReplicated"
    }
  },
  "input_mode": "File"
}]

SageMaker makes this available at /opt/ml/input/data/train/. This coexists with storage_mounts.

Next Steps

Getting Started: Full job submission examples per backend
Custom Images: Build and push training containers

Getting Started

Document Understanding

OCR

Workflows

Training

Mount Configuration

URI Formats

Accessing Mounts in Your Container

Read vs. Write

SageMaker: Input Data Config

Next Steps

​Mount Configuration

​URI Formats

​Accessing Mounts in Your Container

​Read vs. Write

​SageMaker: Input Data Config

​Next Steps

Mount Configuration

URI Formats

Accessing Mounts in Your Container

Read vs. Write

SageMaker: Input Data Config

Next Steps