> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure SGP Deployment

> End-to-end guide to deploying SGP in an Azure subscription.

## Overview

This guide walks you through deploying SGP in an Azure cloud subscription using the SGP Azure Terraform modules. SGP Azure infrastructure is defined by terraform modules managed by Scale.

## Prerequisites

* Access to an Azure subscription with sufficient permissions to create resources (Contributor + User Access Administrator roles, or equivalent)
* The following tools installed:
  * [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) (`az login` completed)
  * [Terraform](https://www.terraform.io/) (`>= 1.1.7, < 2.0.0`)
  * [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
* The following from Scale:
  * The SGP Azure Infrastructure Terraform modules (`azure-terraform/infra`)
  * A `workspace_id` and `registration_secret` unique to your deployment
* A new application configured in your identity provider to authenticate to the SGP platform (SAML or OIDC) (optional)
* A custom domain for your deployment (optional)

## Installation

### Step 1: Build Configuration

Configuration is split across three files. This allows you to reuse the same configuration for multiple environments.

```
infra/
├── main.tfvars.json                         # Selects which domain + environment to deploy
└── config/
    └── <domain_code>/
        ├── default.yaml                     # Domain-wide defaults (all environments)
        └── <location>/
            └── <environment_type>.yaml      # Environment-specific overrides
```

#### File 1: `main.tfvars.json`

This file selects which configuration to load. Edit it to point at your customer and environment before running Terraform.

```json theme={null}
{
  "environment_type": "<dev|qa|staging|uat|prod>",
  "domain_code": "<A unique 3-9 alphanumeric character string to identify yourself>",
  "location": "<Azure region>"
}
```

| Field              | Description                                          | Allowed values                                 |
| ------------------ | ---------------------------------------------------- | ---------------------------------------------- |
| `environment_type` | Environment tier                                     | `dev`, `qa`, `staging`, `uat`, `prod`          |
| `domain_code`      | Customer identifier (3–9 alphanumeric chars/hyphens) | e.g. `acme`, `contoso`                         |
| `location`         | Azure region                                         | e.g. `eastus2`, `northeurope`, `southeastasia` |

#### File 2: `config/<domain_code>/default.yaml`

Customer-wide baseline settings that apply to **all** environments unless overridden. This file is the right place for stable policy and posture decisions: tenant identity, tagging standards, and security baselines for Key Vault, PostgreSQL, Redis, Storage, and other services.

```yaml theme={null}
# Identity
business_unit: "<your_business_unit>"
tenant_id: "<your_azure_tenant_id>"

default_tags:
  costcenter: "<cost_center>"
  product: "<domain_code>"
  deploymenttype: "new"
  safe-to-delete: "no"

# CMK key names (Terraform creates these keys when cmk.create=true in the env YAML)
cmk:
  create: true
  encryption_key_name: "encryptionCMK"
  k8s_key_name: "k8sEncryptionCMK"
  service_bus_key_name: "serviceBusEncryptionCMK"

# Key Vault network posture — secure-by-default baseline.
# Override public_network_access_enabled and network_acls_ip_rules in your env YAML
# if you need temporary public access during bootstrap from a laptop or CI runner.
keyvault:
  public_network_access_enabled: false
  enable_rbac_authorization: true
  sku_name: "standard"
  soft_delete_retention_days: 90    # Cannot be changed after vault creation
  purge_protection_enabled: true
  network_acls_default_action: "Deny"
  network_acls_bypass: "AzureServices"
  network_acls_ip_rules: []

# PostgreSQL baseline
psql:
  public_network_access_enabled: false
  sku_name: "GP_Standard_D2s_v3"
  psql_version: "15"
  backup_retention_days: 7
  bootstrap_aad_principals: false
  auth:
    active_directory_auth_enabled: true
    password_auth_enabled: true
  storage_mb: 32768
  storage_tier: "P10"
  auto_grow_enabled: true
  azure_extensions: "uuid-ossp,ltree"

# Redis baseline
redis:
  public_network_access_enabled: false
  sku_name: "Premium"
  family: "P"
  capacity: 1
  active_directory_authentication_enabled: true
  access_policy_name: "Data Owner"
  non_ssl_port_enabled: false
  tls_version: 1.2
  redis_version: 6

# Storage account baseline
storage_account:
  public_network_access_enabled: false
  min_tls_version: "TLS1_2"
  account_tier: "Standard"
  account_replication_type: "LRS"
  allow_nested_items_to_be_public: false

# OpenAI — leave client.mode empty to let Terraform infer from open_ai.create
open_ai:
  client:
    mode: ""
    key: ""
    org_id: ""
    custom_url: ""

# NAT Gateway baseline
nat_gateway:
  sku_name: "Standard"
  public_ip_sku: "Standard"
  public_ip_allocation_method: "Static"
  zones: ["1", "2", "3"]
  idle_timeout_in_minutes: 10

# AKS baseline
aks:
  load_balancer_sku: "standard"
  only_critical_addons_enabled: true
  key_vault_secret_rotation_enabled: true

# Front Door baseline
frontdoor:
  create: true
  ssl_mode: managed
  minimum_tls_version: "TLS12"
  forwarding_protocol: HttpOnly
  private_link_service_name: "sgp-ingress-lb"
  subdomains: ["auth", "api", "@", "admin"]
  extra_subdomains: []
  dns_ttl_seconds: 60
  ip_filtering:
    enabled: false
    allowed_ip_ranges: []

# System Manager bootstrapping baseline
bootstrapping:
  az_cli_version: "2.59.0"
  helm_chart_version: "2.1.0"
  base_repository: "<your_container_registry_url>"
```

#### File 3: `config/<domain_code>/<location>/<environment_type>.yaml`

Environment-specific configuration. This is where you set everything that differs per environment: resource names, subscription, CIDRs, feature flags, node pool sizing, and bootstrapping state.

```yaml theme={null}
# ── Environment identity ───────────────────────────────────────────────────────
location: eastus2
subscription_id: "<your_azure_subscription_id>"
name_suffix: "<workspace_id>"        # e.g. "acme01"
deployment_id: "<workspace_id>"
deployment_url: "<workspace_id>.workspace.egp.scale.com"
temporal_db_mode: postgresCommonHosted  # Recommended; or "cassandraK8s" (not recommended)

# ── Auth / secrets ────────────────────────────────────────────────────────────
authType: default    # Options: default, SAML, OIDC
RBAC: false
registration_secret: ""     # Auto-generated if empty
identity_service_jwt: ""    # Auto-generated if empty

# ── OIDC (required only when authType = OIDC) ─────────────────────────────────
oidc:
  clientId: ""
  clientSecret: ""
  issuer: ""
  authorizationUrl: ""
  tokenUrl: ""
  userInfoUrl: ""

# ── SAML (required only when authType = SAML) ─────────────────────────────────
saml:
  X509Cert: ""
  SSOUrl: ""
  emailAttrName: ""
  firstNameAttrName: ""
  lastNameAttrName: ""

# ── SSL ───────────────────────────────────────────────────────────────────────
sslMode: e2e    # Options: e2e, managed

# SSL cert/key for Istio ingress (PEM file paths or base64). Required when sslMode=e2e.
istio:
  sslCert: "<path_to_pem_certificate>"
  sslKey: "<path_to_pem_private_key>"

# ── Resource group + networking ───────────────────────────────────────────────
resource_group: "rg-eus2-<domain_code>-<workspace_id>-dev-01"

network:
  vnet: "sgpaz<workspace_id>-vnet"
  int_rt_name: "sgpaz<workspace_id>-rt-int"
  int_nsg_name: "sgpaz<workspace_id>-nsg-int"
  private_cluster: true
  nsg_rules:
    postgresql: true
    redis: true

private_dns:
  create: true
  private_cluster: true

subnets:
  psql:
    name: "sgpaz<workspace_id>-snet-psql"
    type: int
    address_prefixes:
      - "<cidr_block>"    # e.g. "10.95.23.160/27"
    service_endpoints:
      - Microsoft.Storage
    service_delegation:
      name: Microsoft.DBforPostgreSQL/flexibleServers
      actions:
        - Microsoft.Network/virtualNetworks/subnets/join/action

  deployment_scripts:
    name: "sgpaz<workspace_id>-snet-deployscripts"
    type: int
    address_prefixes:
      - "<cidr_block>"
    service_delegation:
      name: Microsoft.ContainerInstance/containerGroups
      actions:
        - Microsoft.Network/virtualNetworks/subnets/action

  aks:
    name: "sgpaz<workspace_id>-snet-aks"
    type: int
    address_prefixes:
      - "<cidr_block>"
    private_endpoint_network_policies: Disabled
    service_endpoints:
      - Microsoft.CognitiveServices

  bastion:
    name: AzureBastionSubnet
    type: int
    address_prefixes:
      - "<cidr_block>"

  jump_host:
    name: "sgpaz<workspace_id>-snet-jumphost"
    type: int
    address_prefixes:
      - "<cidr_block>"

# ── Workload Identity + NAT Gateway ───────────────────────────────────────────
workload_identities:
  enabled: true

nat_gateway:
  enabled: true
  name: "sgpaz<workspace_id>nat"
  zones: []    # [] = regional (no zone pinning)

# ── CMK ───────────────────────────────────────────────────────────────────────
# Set create=true for new deployments. If CMK keys already exist, set create=false
# and provide the key URIs to avoid "already exists (import required)" errors.
cmk:
  enabled: true
  create: true
  # encryption_key_uri: "https://<keyvault>.vault.azure.net/keys/encryptionCMK/<version>"
  # k8s_key_uri: "https://<keyvault>.vault.azure.net/keys/k8sEncryptionCMK/<version>"
  # service_bus_key_uri: "https://<keyvault>.vault.azure.net/keys/serviceBusEncryptionCMK/<version>"

# ── Key Vault ─────────────────────────────────────────────────────────────────
# Override public access settings here if bootstrapping from a laptop or CI runner.
keyvault:
  name: "sgpaz<workspace_id>keyvault"    # Must be globally unique; 3-24 characters
  soft_delete_retention_days: 90          # Must match existing vault if already created
  public_network_access_enabled: true     # Set false once stable; keep true during bootstrap
  network_acls_ip_rules:
    - "<your_public_ip>/32"              # Your laptop or CI runner IP

# ── Core services ─────────────────────────────────────────────────────────────
law:
  create: true

psql:
  create: true
  name: "sgpaz<workspace_id>postgres"    # Must be globally unique
  admin_user: postgres
  bootstrap_aad_principals: true         # Runs once after cluster is up; requires kubectl access

redis:
  create: true
  name: "sgpaz<workspace_id>redis"       # Must be globally unique

storage_account:
  create: true
  name: "sgpaz<workspace_id>storage"    # Must be globally unique; no hyphens
  enable_file_private_endpoint: true

service_bus:
  create: true
  name: "sgpaz<workspace_id>servicebus"
  sku: Premium                            # Premium required for private endpoints
  capacity: 1
  premium_messaging_partitions: 1
  private_endpoint_enabled: true
  public_network_access: false

# ── AI services ───────────────────────────────────────────────────────────────
# Modes: use_openai_via_azure | use_openai_via_custom_endpoint | no_openai
# NOTE: Do not commit real API keys. Keep `key` as a local uncommitted change.
open_ai:
  create: false
  name: "sgpaz<workspace_id>openai"
  client:
    mode: "use_openai_via_custom_endpoint"
    key: "<openai_api_key>"
    custom_url: "<your_azure_openai_endpoint>"

ai_search:
  create: false
  name: "sgpaz<workspace_id>aisearch"

# ── AKS ───────────────────────────────────────────────────────────────────────
aks:
  create: true
  name: "sgpaz<workspace_id>aks"
  node_resource_group: "rg-aks-eus2-<domain_code>-<workspace_id>-dev-01"
  dns_prefix: "egp-k8scluster"
  sku: Free                               # Free | Standard (use Standard for production)
  kubernetes_version: "1.30"
  run_command_enabled: true
  enable_encryption_at_host: true
  use_azure_managed_flux: true
  zones: ["1", "2", "3"]
  istio:
    enabled: true
    ingress_mode: External
    revisions:
      - asm-1-20
  node_pools:
    system:
      name: default
      vm_size: Standard_D4s_v3
      count: 3
      min_count: 3
      max_count: 6
    user:
      enabled: true
      name: user
      vm_size: Standard_D16s_v3
      count: 3
      min_count: 3
      max_count: 10
    gpu:
      enabled: false                      # Set true to enable GPU workloads
      name: gpu
      vm_size: Standard_NV72ads_A10_v5
      min_count: 0
      max_count: 5
    cassandra:
      enabled: false                      # Required only if temporal_db_mode is cassandraK8s
  network:
    pod_cidr: "10.244.0.0/16"
    service_cidr: "10.243.0.0/16"
    dns_service_ip: "10.243.0.10"
    outbound_type: userDefinedRouting

# ── Bastion / jump host ───────────────────────────────────────────────────────
bastion:
  create: true
  admin_username: azureuser
  ssh_public_key: "<your_ssh_public_key>"

# ── Front Door overrides ──────────────────────────────────────────────────────
# Most Front Door settings inherit from default.yaml. Override only what differs.
frontdoor:
  extra_subdomains: []    # e.g. ["chat"] if Agentex UI is enabled

# ── Feature flags ─────────────────────────────────────────────────────────────
feature_flags:
  models: false
  agentex:
    create: false    # Set true to provision Agentex service
  compass:
    create: false    # Set true to provision Workflows service
  dex:
    create: false    # Set true to provision Dex (Document Understanding) service
  reducto:
    create: false    # Set true to provision Reducto service

# ── Bootstrapping (System Manager) ────────────────────────────────────────────
bootstrapping:
  enabled: true
  use_managed_flux: true
  system_manager_version: "<system_manager_image_tag>"
  desiredState: |
    {
      "version": "0.1",
      "packs": [
        { "name": "cert-manager" },
        { "name": "egp" },
        { "name": "identity-service" },
        { "name": "spicedb" },
        { "name": "sgp-apps" },
        { "name": "sgp-models" }
      ]
    }

# ── Policy assignments ────────────────────────────────────────────────────────
policy_assignments:
  enabled: false    # Set true to enforce Azure Policy tag rules

# ── Observability ─────────────────────────────────────────────────────────────
observiqidp:
  monitoring:
    enabled: false
  insights:
    enabled: false
```

<Note>
  **Naming constraints for Azure resources:**

  * Key Vault names: 3–24 alphanumeric characters and hyphens, globally unique
  * Storage Account names: 3–24 lowercase letters and numbers only (no hyphens), globally unique
  * PostgreSQL and Redis names: globally unique within Azure
  * All names must remain stable after first apply — many Azure resources cannot be renamed
</Note>

### Step 2: Provision Infrastructure via Terraform

Navigate to the `infra` directory and initialize:

```bash theme={null}
cd infra

terraform init
```

Review and apply the plan using the `main.tfvars.json` file you configured in the previous step:

```bash theme={null}
terraform plan -var-file=main.tfvars.json -out=tfplan

# Inspect the plan to review resources before proceeding
terraform show tfplan

terraform apply tfplan
```

If the Datadog monitoring integration is enabled in your YAML, pass the API key via environment variable rather than committing it to file:

```bash theme={null}
TF_VAR_datadog_api_key="<your_key>" terraform plan -var-file=main.tfvars.json -out=tfplan
```

*This step may take significant time (30–60 minutes) due to resource creation dependencies, particularly the AKS cluster and PostgreSQL Flexible Server.*

### Step 3: Bootstrap the Cluster

The Azure infrastructure **automatically bootstraps** [SGP System Manager](/docs/infrastructure/system-manager) via an Azure Deployment Script. When `bootstrapping.enabled: true` is set in your configuration, Terraform provisions an Azure Container Instance that:

1. Installs Flux CD on the AKS cluster (using Azure Managed Flux if `use_managed_flux: true`)
2. Applies the System Manager `HelmRepository` and `HelmRelease` Flux CRDs
3. Waits for System Manager to reconcile

Monitor bootstrap progress in the Azure Portal under **Deployment Scripts** in your resource group, or check System Manager logs after bootstrap:

```bash theme={null}
# Get AKS credentials (from inside the VNet or via Bastion — cluster is private by default)
az aks get-credentials \
  --resource-group <resource_group> \
  --name sgpaz<workspace_id>aks \
  --overwrite-existing

# Verify System Manager is running
kubectl get pods -n sgp-system-manager

# Watch System Manager deploy remaining services via Flux
kubectl get helmreleases -A
```

#### Accessing the Private AKS Cluster

Because the AKS cluster is private by default, you must access it from within the provisioned VNet. Two options are provided:

**Option A: From the jump host VM (via Azure Bastion)**

The Bastion host and jump host VM are provisioned when `bastion.create: true`. Connect via the Azure Portal (Bastion blade) or using the helper script:

```bash theme={null}
python3 scripts/connect_private_aks.py --run-mode=local --bootstrap-jump-host
```

**Option B: Using `az aks command invoke`**

When `aks.run_command_enabled: true`, you can run kubectl commands without VPN access:

```bash theme={null}
az aks command invoke \
  --resource-group <resource_group> \
  --name sgpaz<workspace_id>aks \
  --command "kubectl get helmreleases -A"
```

### Step 4: Configure DNS

After `terraform apply` completes, retrieve the Front Door endpoint hostname:

```bash theme={null}
terraform output frontdoor_endpoint_host_name
terraform output frontdoor_dns_zone_id
```

Configure a CNAME record in your DNS provider pointing your `deployment_url` to the Front Door endpoint (the `azurefd.net` hostname from the output above).

<Note>
  If using Azure DNS (the DNS zone is managed by Terraform when `frontdoor.create: true`), CNAME records are created automatically. Verify with:

  ```bash theme={null}
  az network dns record-set list \
    --resource-group <resource_group> \
    --zone-name <deployment_url>
  ```
</Note>

### Step 5: Verify the Deployment

Wait for all services to be ready:

```bash theme={null}
kubectl get helmreleases -A  # All should show Ready=True
kubectl get pods -A          # All should be Running or Completed
```

System Manager continuously reconciles the desired state. The `bootstrapping.desiredState` value in your environment YAML is written to a secret in Azure Key Vault during `terraform apply`, and System Manager reads from that secret at runtime. If a HelmRelease shows `Ready=False`, check its events:

```bash theme={null}
kubectl describe helmrelease <name> -n <namespace>
```

### Step 6: Configure Identity Provider

#### SAML Configuration

Set `authType: "SAML"` in your environment YAML, then configure your Identity Provider with:

* **Service Entity ID**: `https://auth.<deployment_url>`
* **Redirect URI**: `https://auth.<deployment_url>/dashboard/org/saml/callback`

Update the `is-saml-secret` secret in Key Vault (or via System Manager GUI):

```json theme={null}
{
  "id": "<workspace_id>",
  "samlConfiguration": {
    "entityId": "<entity_id_from_your_identity_provider>",
    "x509Cert": "<x509_certificate_from_your_identity_provider>",
    "ssoUrl": "<sso_url_from_your_identity_provider>",
    "attributeMappings": {
      "email": "<email_attribute_name>",
      "firstName": "<first_name_attribute_name>",
      "lastName": "<last_name_attribute_name>"
    }
  }
}
```

#### OIDC Configuration

Set `authType: "OIDC"` in your environment YAML, then configure your Identity Provider with:

* **Redirect URI**: `https://auth.<deployment_url>/dashboard/org/oidc/callback`

Update the `is-oidc-secret` secret in Key Vault (or via System Manager GUI):

```json theme={null}
{
  "id": "<workspace_id>",
  "oidcConfiguration": {
    "clientId": "<client_id_from_your_identity_provider>",
    "clientSecret": "<client_secret_from_your_identity_provider>",
    "issuer": "<issuer_from_your_identity_provider>",
    "authorizationUrl": "<authorization_url>",
    "tokenUrl": "<token_url>",
    "userInfoUrl": "<user_info_url>"
  }
}
```

After modifying an identity secret, restart System Manager to apply the changes:

```bash theme={null}
kubectl rollout restart deployment sgp-system-manager -n sgp-system-manager
```

## Accessing the Platform

If all goes smoothly, you should be able to navigate to the SGP platform at `https://<workspace_id>.workspace.egp.scale.com` (or your custom domain) and authenticate via the configured identity provider.
