> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# System Manager

> Overview of the SGP System Manager deployment orchestrator

## Overview

System Manager is a Kubernetes operator used to deploy and manage the SGP platform. It is responsible for deploying the SGP platform services and agents to an existing Kubernetes cluster.

## Installation

System Manager is installed as a Helm chart into the Kubernetes cluster during the deployment of the SGP platform. See your cloud provider's corresponding deployment guide for more information.

## Configuration

System Manager is configured via `system-manager-config.json`. This file is stored in the cloud provider's secret manager. To modify the configuration, you can either update the secret directly or use the System Manager GUI, then restart the System Manager deployment to apply the changes.

```bash theme={null}
kubectl rollout restart deployment sgp-system-manager -n sgp-system-manager
```

An example configuration file is shown below. The `aws` block is only present for AWS deployments; GCP and Azure deployments use equivalent `gcp` and `azure` blocks instead. The `baseRepository` format also varies by cloud provider (e.g., `oci://<region>-docker.pkg.dev/<project-id>/sgp-<workspace_id>-helm-repository` for GCP, `oci://<account_id>.dkr.ecr.<region>.amazonaws.com/sgp-<workspace_id>-helm-repository` for AWS).

```json theme={null}
{
  "cloudProvider": "<aws|azure|gcp>",
  "baseRepository": "<helm-repository-oci-url>",
  "samlSetupEnabled": true,
  "oidcSetupEnabled": true,
  "deploymentURL": "https://<workspace_id>.workspace.egp.scale.com",
  "workspaceId": "<workspace_id>",
  "authType": "<saml|oidc>",
  "baseDomain": "<base_domain>",
  "scaletrain_tenant_prefix": "<scaletrain_tenant_prefix>",
  "train_tenant_prefix": "<train_tenant_prefix>",
  "deployAgentex": true,
  "deploySae": true,
  "aws": {
    "accountId": "<aws_account_id>",
    "region": "<aws_region>",
    "prefix": "<prefix>",
    "modelEngineS3Bucket": "scale-egp-<workspace_id>-ml",
    "sqsQueuePolicyTemplate": "",
    "sqsQueueTagTemplate": "",
    "clusterName": "<cluster_name>",
    "karpenterIrsaArn": "<karpenter_irsa_arn>",
    "targetGroupArn": "<target_group_arn>",
    "nodeSubnets": "<node_subnets>",
    "nodeSecurityGroup": "<node_security_group>",
    "postgresHostTemporal": "<postgres_host_temporal>",
    "compassBucketName": "<compass_bucket_name>",
    "compassMongoHost": "<compass_mongo_host>",
    "compassRedisHost": "<compass_redis_host>",
    "reductoBucketName": "<reducto_bucket_name>",
    "reductoDatabaseUrl": "<reducto_database_url>",
    "reductoIrsaRoleArn": "<reducto_irsa_role_arn>",
    "reductoAzureVisionEndpoint": "<reducto_azure_vision_endpoint>",
    "reductoAzureVisionKey": "<reducto_azure_vision_key>",
    "codeBuildProjectName": "<codebuild_project_name>",
    "codeBuildS3Bucket": "<codebuild_s3_bucket>",
    "codeBuildRegistryUrl": "<codebuild_ecr_registry_url>",
    "codeBuildServiceRoleArn": "<codebuild_service_role_arn>",
    "cloudDeployEnabled": true,
    "dex": {
      "irsaRoleArn": "<duc_api_backend_irsa_role_arn>",
      "prefix": "duc-<workspace_id>"
    },
    "identities": {
      "sgpModels": {
        "irsaArn": "<sgp_models_irsa_arn>",
        "secretArns": {
          "backend": "<sgp_models_backend_secret_arn>",
          "model-providers": "<sgp_models_provider_secret_arn>"
        }
      }
    },
    "train": {
      "irsaRoleArn": "<train_irsa_role_arn>",
      "sagemakerExecutionRoleArn": "<train_sagemaker_execution_role_arn>",
      "sagemakerSecurityGroupId": "<train_sagemaker_security_group_id>",
      "databaseHost": "<train_database_host>",
      "dataBucket": "<train_s3_data_bucket_name>",
      "checkpointsBucket": "<train_s3_checkpoints_bucket_name>",
      "outputBucket": "<train_s3_output_bucket_name>",
      "stagingBucket": "<train_s3_staging_bucket_name>"
    },
    "registryProxy": {
      "irsaRoleArn": "<registry_proxy_irsa_role_arn>",
      "ecrRepositoryUrl": "<registry_proxy_ecr_repository_url>",
      "jwtSecret": "<registry_proxy_jwt_secret>",
      "workspaceConfig": "<workspace_config_json>"
    }
  },
  "frontDoorSSLCertB64": "<istio_certificate_b64_resolved>",
  "frontDoorSSLKeyB64": "<istio_key_b64_resolved>",
  "initialDesiredState": "<desired_state_json>",
  "datadog": {
    "enabled": true,
    "env": "<datadog_context>",
    "clusterName": "<cluster_name>",
    "secretName": "<datadog_secret_name>",
    "irsaRoleArn": "<datadog_irsa_role_arn>"
  }
}
```

## Architecture

System Manager runs as a deployment in the `sgp-system-manager` namespace. Its GUI is accessible on port 8000.

```bash theme={null}
kubectl port-forward deployment/sgp-system-manager -n sgp-system-manager 8000:8000
```

Then navigate to `http://localhost:8000` in your browser to access the GUI.

```mermaid theme={null}
graph TD
    ADMIN(["Platform Admin"])

    subgraph CLUSTER["Kubernetes Cluster"]
        subgraph SM_NS["sgp-system-manager namespace"]
            GUI["Web GUI<br/>:8000"]
            API["REST API<br/>/api/v3"]
            WATCHER["Secret Watcher<br/>polls every 3s"]
            INSTALLER["Pack Installer"]
        end

        subgraph RESOURCES["Cluster Resources"]
            NS["Namespaces"]
            SEC["Secrets + Pull Secrets"]
            HR["HelmRelease CRDs"]
        end

        subgraph FLUX_NS["flux-system namespace"]
            FLUX["Flux Helm Controller"]
        end

        PODS["SGP Services<br/>(Helm-managed pods)"]
    end

    subgraph CLOUD["Cloud Provider"]
        SM_SECRET["Secret Manager<br/>desired-state · config"]
        HELM_REPO["Helm Repository<br/>OCI Artifact Registry"]
    end

    ADMIN -->|"browser"| GUI
    GUI --> API
    API -->|"read / write"| SM_SECRET
    WATCHER -->|"detect external changes"| SM_SECRET
    API --> INSTALLER
    INSTALLER --> NS
    INSTALLER --> SEC
    INSTALLER --> HR
    FLUX -->|"watches"| HR
    FLUX -->|"pull chart"| HELM_REPO
    FLUX -->|"deploy"| PODS
```

### Packs

System Manager organizes services into "packs". Each pack is a collection of resources that are deployed together. Generally packs are composed of [FluxCD HelmRelease](https://fluxcd.io/flux/components/helm/helmreleases/) as well as other resources necessary to support a particular service.

When a pack is installed, System Manager renders its resource templates and writes the resulting Kubernetes resources — namespaces, secrets, and FluxCD HelmRelease CRDs — to the cluster. FluxCD then picks up the HelmRelease CRDs and handles pulling and deploying the Helm charts.

### FluxCD

System Manager offloads resource reconciliation to FluxCD. FluxCD is a tool that allows you to manage the lifecycle of your Kubernetes resources. It is responsible for ensuring that the desired state of the resources is maintained. For more information on FluxCD, see the [FluxCD documentation](https://fluxcd.io/flux/components/helm/helmreleases/).

### desired-state.json

The collection of packs that System Manager will deploy is defined in the `desired-state.json` file. This file is stored in the cloud provider's secret manager. To modify the desired state, you can either update the secret directly or use the System Manager GUI, then trigger reconciliation via the System Manager GUI. A sample desired state file is shown below:

```json theme={null}
{
  "version": "0.1",
  "packs": [
    { "name": "flux" },
    { "name": "sgp-helm-repository" },
    { "name": "istio" },
    { "name": "spicedb" },
    { "name": "identity-service" },
    { "name": "temporalf" },
    { "name": "sgp-apps" }
  ]
}
```
