> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Storage Mounts

> Mount cloud storage into training containers using a unified interface across GCP, AWS, and Azure.

Train mounts cloud storage into your container before the job starts and sets `STORAGE_MOUNT_0_PATH`, `STORAGE_MOUNT_1_PATH`, etc. as environment variables. Your training script reads from these paths, with no cloud-specific SDKs or storage code needed.

***

## Mount Configuration

Each entry in `storage_mounts` maps a cloud URI to a container path:

```json theme={null}
"storage_mounts": [
  {
    "source_uri": "gs://my-bucket/training-data",
    "mount_path": "/mnt/data",
    "read_only": true
  },
  {
    "source_uri": "gs://my-bucket/output/run-1",
    "mount_path": "/mnt/output",
    "read_only": false
  }
]
```

Inside the container:

```
STORAGE_MOUNT_0_PATH=/mnt/data
STORAGE_MOUNT_1_PATH=/mnt/output
```

Mounts are indexed in order, starting at zero.

***

## URI Formats

| Cloud | Format                                 | Example                                        |
| ----- | -------------------------------------- | ---------------------------------------------- |
| GCP   | `gs://bucket/path`                     | `gs://my-bucket/datasets/imagenet`             |
| AWS   | `s3://bucket/path`                     | `s3://my-bucket/datasets/imagenet`             |
| Azure | `azureml://datastores/name/paths/path` | `azureml://datastores/training/paths/imagenet` |

<Note>
  Azure storage mounts reference registered Azure ML datastores, not raw Blob Storage URLs. Datastores are configured in your Azure ML workspace.
</Note>

***

## Accessing Mounts in Your Container

```python focus={5-6} theme={null}
import os
import torch
from datasets import load_dataset

data_path = os.environ["STORAGE_MOUNT_0_PATH"]
output_path = os.environ["STORAGE_MOUNT_1_PATH"]

dataset = load_dataset(data_path)

# ... training loop ...
torch.save(model.state_dict(), f"{output_path}/checkpoint-epoch-{epoch}.pt")
```

This works identically on GCP, AWS, and Azure.

***

## Read vs. Write

Set `read_only` based on intent. Use `true` for input data and pretrained weights; `false` for outputs, checkpoints, and logs. Marking input mounts read-only prevents accidental writes and may improve performance on some backends.

***

## SageMaker: Input Data Config

SageMaker also supports `input_data_config` for S3 input channels. Unlike FUSE mounts, channels are downloaded to the instance before training starts, which can be faster for large datasets with random access patterns:

```json theme={null}
"input_data_config": [{
  "channel_name": "train",
  "data_source": {
    "s3_data_source": {
      "s3_data_type": "S3Prefix",
      "s3_uri": "s3://my-bucket/datasets/train/",
      "s3_data_distribution_type": "FullyReplicated"
    }
  },
  "input_mode": "File"
}]
```

SageMaker makes this available at `/opt/ml/input/data/train/`. This coexists with `storage_mounts`.

***

## Next Steps

* **[Getting Started](/docs/capabilities/training/getting-started-with-train)**: Full job submission examples per backend
* **[Custom Images](/docs/capabilities/training/custom-images)**: Build and push training containers
