Skip to main content

Overview

This guide walks you through deploying SGP in an Azure cloud subscription using the SGP Azure Terraform modules. SGP Azure infrastructure is defined by terraform modules managed by Scale.

Prerequisites

  • Access to an Azure subscription with sufficient permissions to create resources (Contributor + User Access Administrator roles, or equivalent)
  • The following tools installed:
  • The following from Scale:
    • The SGP Azure Infrastructure Terraform modules (azure-terraform/infra)
    • A workspace_id and registration_secret unique to your deployment
  • A new application configured in your identity provider to authenticate to the SGP platform (SAML or OIDC) (optional)
  • A custom domain for your deployment (optional)

Installation

Step 1: Build Configuration

Configuration is split across three files. This allows you to reuse the same configuration for multiple environments.
infra/
├── main.tfvars.json                         # Selects which domain + environment to deploy
└── config/
    └── <domain_code>/
        ├── default.yaml                     # Domain-wide defaults (all environments)
        └── <location>/
            └── <environment_type>.yaml      # Environment-specific overrides

File 1: main.tfvars.json

This file selects which configuration to load. Edit it to point at your customer and environment before running Terraform.
{
  "environment_type": "<dev|qa|staging|uat|prod>",
  "domain_code": "<A unique 3-9 alphanumeric character string to identify yourself>",
  "location": "<Azure region>"
}
FieldDescriptionAllowed values
environment_typeEnvironment tierdev, qa, staging, uat, prod
domain_codeCustomer identifier (3–9 alphanumeric chars/hyphens)e.g. acme, contoso
locationAzure regione.g. eastus2, northeurope, southeastasia

File 2: config/<domain_code>/default.yaml

Customer-wide baseline settings that apply to all environments unless overridden. This file is the right place for stable policy and posture decisions: tenant identity, tagging standards, and security baselines for Key Vault, PostgreSQL, Redis, Storage, and other services.
# Identity
business_unit: "<your_business_unit>"
tenant_id: "<your_azure_tenant_id>"

default_tags:
  costcenter: "<cost_center>"
  product: "<domain_code>"
  deploymenttype: "new"
  safe-to-delete: "no"

# CMK key names (Terraform creates these keys when cmk.create=true in the env YAML)
cmk:
  create: true
  encryption_key_name: "encryptionCMK"
  k8s_key_name: "k8sEncryptionCMK"
  service_bus_key_name: "serviceBusEncryptionCMK"

# Key Vault network posture — secure-by-default baseline.
# Override public_network_access_enabled and network_acls_ip_rules in your env YAML
# if you need temporary public access during bootstrap from a laptop or CI runner.
keyvault:
  public_network_access_enabled: false
  enable_rbac_authorization: true
  sku_name: "standard"
  soft_delete_retention_days: 90    # Cannot be changed after vault creation
  purge_protection_enabled: true
  network_acls_default_action: "Deny"
  network_acls_bypass: "AzureServices"
  network_acls_ip_rules: []

# PostgreSQL baseline
psql:
  public_network_access_enabled: false
  sku_name: "GP_Standard_D2s_v3"
  psql_version: "15"
  backup_retention_days: 7
  bootstrap_aad_principals: false
  auth:
    active_directory_auth_enabled: true
    password_auth_enabled: true
  storage_mb: 32768
  storage_tier: "P10"
  auto_grow_enabled: true
  azure_extensions: "uuid-ossp,ltree"

# Redis baseline
redis:
  public_network_access_enabled: false
  sku_name: "Premium"
  family: "P"
  capacity: 1
  active_directory_authentication_enabled: true
  access_policy_name: "Data Owner"
  non_ssl_port_enabled: false
  tls_version: 1.2
  redis_version: 6

# Storage account baseline
storage_account:
  public_network_access_enabled: false
  min_tls_version: "TLS1_2"
  account_tier: "Standard"
  account_replication_type: "LRS"
  allow_nested_items_to_be_public: false

# OpenAI — leave client.mode empty to let Terraform infer from open_ai.create
open_ai:
  client:
    mode: ""
    key: ""
    org_id: ""
    custom_url: ""

# NAT Gateway baseline
nat_gateway:
  sku_name: "Standard"
  public_ip_sku: "Standard"
  public_ip_allocation_method: "Static"
  zones: ["1", "2", "3"]
  idle_timeout_in_minutes: 10

# AKS baseline
aks:
  load_balancer_sku: "standard"
  only_critical_addons_enabled: true
  key_vault_secret_rotation_enabled: true

# Front Door baseline
frontdoor:
  create: true
  ssl_mode: managed
  minimum_tls_version: "TLS12"
  forwarding_protocol: HttpOnly
  private_link_service_name: "sgp-ingress-lb"
  subdomains: ["auth", "api", "@", "admin"]
  extra_subdomains: []
  dns_ttl_seconds: 60
  ip_filtering:
    enabled: false
    allowed_ip_ranges: []

# System Manager bootstrapping baseline
bootstrapping:
  az_cli_version: "2.59.0"
  helm_chart_version: "2.1.0"
  base_repository: "<your_container_registry_url>"

File 3: config/<domain_code>/<location>/<environment_type>.yaml

Environment-specific configuration. This is where you set everything that differs per environment: resource names, subscription, CIDRs, feature flags, node pool sizing, and bootstrapping state.
# ── Environment identity ───────────────────────────────────────────────────────
location: eastus2
subscription_id: "<your_azure_subscription_id>"
name_suffix: "<workspace_id>"        # e.g. "acme01"
deployment_id: "<workspace_id>"
deployment_url: "<workspace_id>.workspace.egp.scale.com"
temporal_db_mode: postgresCommonHosted  # Recommended; or "cassandraK8s" (not recommended)

# ── Auth / secrets ────────────────────────────────────────────────────────────
authType: default    # Options: default, SAML, OIDC
RBAC: false
registration_secret: ""     # Auto-generated if empty
identity_service_jwt: ""    # Auto-generated if empty

# ── OIDC (required only when authType = OIDC) ─────────────────────────────────
oidc:
  clientId: ""
  clientSecret: ""
  issuer: ""
  authorizationUrl: ""
  tokenUrl: ""
  userInfoUrl: ""

# ── SAML (required only when authType = SAML) ─────────────────────────────────
saml:
  X509Cert: ""
  SSOUrl: ""
  emailAttrName: ""
  firstNameAttrName: ""
  lastNameAttrName: ""

# ── SSL ───────────────────────────────────────────────────────────────────────
sslMode: e2e    # Options: e2e, managed

# SSL cert/key for Istio ingress (PEM file paths or base64). Required when sslMode=e2e.
istio:
  sslCert: "<path_to_pem_certificate>"
  sslKey: "<path_to_pem_private_key>"

# ── Resource group + networking ───────────────────────────────────────────────
resource_group: "rg-eus2-<domain_code>-<workspace_id>-dev-01"

network:
  vnet: "sgpaz<workspace_id>-vnet"
  int_rt_name: "sgpaz<workspace_id>-rt-int"
  int_nsg_name: "sgpaz<workspace_id>-nsg-int"
  private_cluster: true
  nsg_rules:
    postgresql: true
    redis: true

private_dns:
  create: true
  private_cluster: true

subnets:
  psql:
    name: "sgpaz<workspace_id>-snet-psql"
    type: int
    address_prefixes:
      - "<cidr_block>"    # e.g. "10.95.23.160/27"
    service_endpoints:
      - Microsoft.Storage
    service_delegation:
      name: Microsoft.DBforPostgreSQL/flexibleServers
      actions:
        - Microsoft.Network/virtualNetworks/subnets/join/action

  deployment_scripts:
    name: "sgpaz<workspace_id>-snet-deployscripts"
    type: int
    address_prefixes:
      - "<cidr_block>"
    service_delegation:
      name: Microsoft.ContainerInstance/containerGroups
      actions:
        - Microsoft.Network/virtualNetworks/subnets/action

  aks:
    name: "sgpaz<workspace_id>-snet-aks"
    type: int
    address_prefixes:
      - "<cidr_block>"
    private_endpoint_network_policies: Disabled
    service_endpoints:
      - Microsoft.CognitiveServices

  bastion:
    name: AzureBastionSubnet
    type: int
    address_prefixes:
      - "<cidr_block>"

  jump_host:
    name: "sgpaz<workspace_id>-snet-jumphost"
    type: int
    address_prefixes:
      - "<cidr_block>"

# ── Workload Identity + NAT Gateway ───────────────────────────────────────────
workload_identities:
  enabled: true

nat_gateway:
  enabled: true
  name: "sgpaz<workspace_id>nat"
  zones: []    # [] = regional (no zone pinning)

# ── CMK ───────────────────────────────────────────────────────────────────────
# Set create=true for new deployments. If CMK keys already exist, set create=false
# and provide the key URIs to avoid "already exists (import required)" errors.
cmk:
  enabled: true
  create: true
  # encryption_key_uri: "https://<keyvault>.vault.azure.net/keys/encryptionCMK/<version>"
  # k8s_key_uri: "https://<keyvault>.vault.azure.net/keys/k8sEncryptionCMK/<version>"
  # service_bus_key_uri: "https://<keyvault>.vault.azure.net/keys/serviceBusEncryptionCMK/<version>"

# ── Key Vault ─────────────────────────────────────────────────────────────────
# Override public access settings here if bootstrapping from a laptop or CI runner.
keyvault:
  name: "sgpaz<workspace_id>keyvault"    # Must be globally unique; 3-24 characters
  soft_delete_retention_days: 90          # Must match existing vault if already created
  public_network_access_enabled: true     # Set false once stable; keep true during bootstrap
  network_acls_ip_rules:
    - "<your_public_ip>/32"              # Your laptop or CI runner IP

# ── Core services ─────────────────────────────────────────────────────────────
law:
  create: true

psql:
  create: true
  name: "sgpaz<workspace_id>postgres"    # Must be globally unique
  admin_user: postgres
  bootstrap_aad_principals: true         # Runs once after cluster is up; requires kubectl access

redis:
  create: true
  name: "sgpaz<workspace_id>redis"       # Must be globally unique

storage_account:
  create: true
  name: "sgpaz<workspace_id>storage"    # Must be globally unique; no hyphens
  enable_file_private_endpoint: true

service_bus:
  create: true
  name: "sgpaz<workspace_id>servicebus"
  sku: Premium                            # Premium required for private endpoints
  capacity: 1
  premium_messaging_partitions: 1
  private_endpoint_enabled: true
  public_network_access: false

# ── AI services ───────────────────────────────────────────────────────────────
# Modes: use_openai_via_azure | use_openai_via_custom_endpoint | no_openai
# NOTE: Do not commit real API keys. Keep `key` as a local uncommitted change.
open_ai:
  create: false
  name: "sgpaz<workspace_id>openai"
  client:
    mode: "use_openai_via_custom_endpoint"
    key: "<openai_api_key>"
    custom_url: "<your_azure_openai_endpoint>"

ai_search:
  create: false
  name: "sgpaz<workspace_id>aisearch"

# ── AKS ───────────────────────────────────────────────────────────────────────
aks:
  create: true
  name: "sgpaz<workspace_id>aks"
  node_resource_group: "rg-aks-eus2-<domain_code>-<workspace_id>-dev-01"
  dns_prefix: "egp-k8scluster"
  sku: Free                               # Free | Standard (use Standard for production)
  kubernetes_version: "1.30"
  run_command_enabled: true
  enable_encryption_at_host: true
  use_azure_managed_flux: true
  zones: ["1", "2", "3"]
  istio:
    enabled: true
    ingress_mode: External
    revisions:
      - asm-1-20
  node_pools:
    system:
      name: default
      vm_size: Standard_D4s_v3
      count: 3
      min_count: 3
      max_count: 6
    user:
      enabled: true
      name: user
      vm_size: Standard_D16s_v3
      count: 3
      min_count: 3
      max_count: 10
    gpu:
      enabled: false                      # Set true to enable GPU workloads
      name: gpu
      vm_size: Standard_NV72ads_A10_v5
      min_count: 0
      max_count: 5
    cassandra:
      enabled: false                      # Required only if temporal_db_mode is cassandraK8s
  network:
    pod_cidr: "10.244.0.0/16"
    service_cidr: "10.243.0.0/16"
    dns_service_ip: "10.243.0.10"
    outbound_type: userDefinedRouting

# ── Bastion / jump host ───────────────────────────────────────────────────────
bastion:
  create: true
  admin_username: azureuser
  ssh_public_key: "<your_ssh_public_key>"

# ── Front Door overrides ──────────────────────────────────────────────────────
# Most Front Door settings inherit from default.yaml. Override only what differs.
frontdoor:
  extra_subdomains: []    # e.g. ["chat"] if Agentex UI is enabled

# ── Feature flags ─────────────────────────────────────────────────────────────
feature_flags:
  models: false
  agentex:
    create: false    # Set true to provision Agentex service
  compass:
    create: false    # Set true to provision Compass service
  dex:
    create: false    # Set true to provision Dex (Document Understanding) service
  reducto:
    create: false    # Set true to provision Reducto service

# ── Bootstrapping (System Manager) ────────────────────────────────────────────
bootstrapping:
  enabled: true
  use_managed_flux: true
  system_manager_version: "<system_manager_image_tag>"
  desiredState: |
    {
      "version": "0.1",
      "packs": [
        { "name": "cert-manager" },
        { "name": "egp" },
        { "name": "identity-service" },
        { "name": "spicedb" },
        { "name": "sgp-apps" },
        { "name": "sgp-models" }
      ]
    }

# ── Policy assignments ────────────────────────────────────────────────────────
policy_assignments:
  enabled: false    # Set true to enforce Azure Policy tag rules

# ── Observability ─────────────────────────────────────────────────────────────
observiqidp:
  monitoring:
    enabled: false
  insights:
    enabled: false
Naming constraints for Azure resources:
  • Key Vault names: 3–24 alphanumeric characters and hyphens, globally unique
  • Storage Account names: 3–24 lowercase letters and numbers only (no hyphens), globally unique
  • PostgreSQL and Redis names: globally unique within Azure
  • All names must remain stable after first apply — many Azure resources cannot be renamed

Step 2: Provision Infrastructure via Terraform

Navigate to the infra directory and initialize:
cd infra

terraform init
Review and apply the plan using the main.tfvars.json file you configured in the previous step:
terraform plan -var-file=main.tfvars.json -out=tfplan

# Inspect the plan to review resources before proceeding
terraform show tfplan

terraform apply tfplan
If the Datadog monitoring integration is enabled in your YAML, pass the API key via environment variable rather than committing it to file:
TF_VAR_datadog_api_key="<your_key>" terraform plan -var-file=main.tfvars.json -out=tfplan
This step may take significant time (30–60 minutes) due to resource creation dependencies, particularly the AKS cluster and PostgreSQL Flexible Server.

Step 3: Bootstrap the Cluster

The Azure infrastructure automatically bootstraps SGP System Manager via an Azure Deployment Script. When bootstrapping.enabled: true is set in your configuration, Terraform provisions an Azure Container Instance that:
  1. Installs Flux CD on the AKS cluster (using Azure Managed Flux if use_managed_flux: true)
  2. Applies the System Manager HelmRepository and HelmRelease Flux CRDs
  3. Waits for System Manager to reconcile
Monitor bootstrap progress in the Azure Portal under Deployment Scripts in your resource group, or check System Manager logs after bootstrap:
# Get AKS credentials (from inside the VNet or via Bastion — cluster is private by default)
az aks get-credentials \
  --resource-group <resource_group> \
  --name sgpaz<workspace_id>aks \
  --overwrite-existing

# Verify System Manager is running
kubectl get pods -n sgp-system-manager

# Watch System Manager deploy remaining services via Flux
kubectl get helmreleases -A

Accessing the Private AKS Cluster

Because the AKS cluster is private by default, you must access it from within the provisioned VNet. Two options are provided: Option A: From the jump host VM (via Azure Bastion) The Bastion host and jump host VM are provisioned when bastion.create: true. Connect via the Azure Portal (Bastion blade) or using the helper script:
python3 scripts/connect_private_aks.py --run-mode=local --bootstrap-jump-host
Option B: Using az aks command invoke When aks.run_command_enabled: true, you can run kubectl commands without VPN access:
az aks command invoke \
  --resource-group <resource_group> \
  --name sgpaz<workspace_id>aks \
  --command "kubectl get helmreleases -A"

Step 4: Configure DNS

After terraform apply completes, retrieve the Front Door endpoint hostname:
terraform output frontdoor_endpoint_host_name
terraform output frontdoor_dns_zone_id
Configure a CNAME record in your DNS provider pointing your deployment_url to the Front Door endpoint (the azurefd.net hostname from the output above).
If using Azure DNS (the DNS zone is managed by Terraform when frontdoor.create: true), CNAME records are created automatically. Verify with:
az network dns record-set list \
  --resource-group <resource_group> \
  --zone-name <deployment_url>

Step 5: Verify the Deployment

Wait for all services to be ready:
kubectl get helmreleases -A  # All should show Ready=True
kubectl get pods -A          # All should be Running or Completed
System Manager continuously reconciles the desired state. The bootstrapping.desiredState value in your environment YAML is written to a secret in Azure Key Vault during terraform apply, and System Manager reads from that secret at runtime. If a HelmRelease shows Ready=False, check its events:
kubectl describe helmrelease <name> -n <namespace>

Step 6: Configure Identity Provider

SAML Configuration

Set authType: "SAML" in your environment YAML, then configure your Identity Provider with:
  • Service Entity ID: https://auth.<deployment_url>
  • Redirect URI: https://auth.<deployment_url>/dashboard/org/saml/callback
Update the is-saml-secret secret in Key Vault (or via System Manager GUI):
{
  "id": "<workspace_id>",
  "samlConfiguration": {
    "entityId": "<entity_id_from_your_identity_provider>",
    "x509Cert": "<x509_certificate_from_your_identity_provider>",
    "ssoUrl": "<sso_url_from_your_identity_provider>",
    "attributeMappings": {
      "email": "<email_attribute_name>",
      "firstName": "<first_name_attribute_name>",
      "lastName": "<last_name_attribute_name>"
    }
  }
}

OIDC Configuration

Set authType: "OIDC" in your environment YAML, then configure your Identity Provider with:
  • Redirect URI: https://auth.<deployment_url>/dashboard/org/oidc/callback
Update the is-oidc-secret secret in Key Vault (or via System Manager GUI):
{
  "id": "<workspace_id>",
  "oidcConfiguration": {
    "clientId": "<client_id_from_your_identity_provider>",
    "clientSecret": "<client_secret_from_your_identity_provider>",
    "issuer": "<issuer_from_your_identity_provider>",
    "authorizationUrl": "<authorization_url>",
    "tokenUrl": "<token_url>",
    "userInfoUrl": "<user_info_url>"
  }
}
After modifying an identity secret, restart System Manager to apply the changes:
kubectl rollout restart deployment sgp-system-manager -n sgp-system-manager

Accessing the Platform

If all goes smoothly, you should be able to navigate to the SGP platform at https://<workspace_id>.workspace.egp.scale.com (or your custom domain) and authenticate via the configured identity provider.