Documentation Index
Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This guide walks you through deploying SGP in an Azure cloud subscription using the SGP Azure Terraform modules. SGP Azure infrastructure is defined by terraform modules managed by Scale.
Prerequisites
- Access to an Azure subscription with sufficient permissions to create resources (Contributor + User Access Administrator roles, or equivalent)
- The following tools installed:
- The following from Scale:
- The SGP Azure Infrastructure Terraform modules (
azure-terraform/infra)
- A
workspace_id and registration_secret unique to your deployment
- A new application configured in your identity provider to authenticate to the SGP platform (SAML or OIDC) (optional)
- A custom domain for your deployment (optional)
Installation
Step 1: Build Configuration
Configuration is split across three files. This allows you to reuse the same configuration for multiple environments.
infra/
├── main.tfvars.json # Selects which domain + environment to deploy
└── config/
└── <domain_code>/
├── default.yaml # Domain-wide defaults (all environments)
└── <location>/
└── <environment_type>.yaml # Environment-specific overrides
File 1: main.tfvars.json
This file selects which configuration to load. Edit it to point at your customer and environment before running Terraform.
{
"environment_type": "<dev|qa|staging|uat|prod>",
"domain_code": "<A unique 3-9 alphanumeric character string to identify yourself>",
"location": "<Azure region>"
}
| Field | Description | Allowed values |
|---|
environment_type | Environment tier | dev, qa, staging, uat, prod |
domain_code | Customer identifier (3–9 alphanumeric chars/hyphens) | e.g. acme, contoso |
location | Azure region | e.g. eastus2, northeurope, southeastasia |
File 2: config/<domain_code>/default.yaml
Customer-wide baseline settings that apply to all environments unless overridden. This file is the right place for stable policy and posture decisions: tenant identity, tagging standards, and security baselines for Key Vault, PostgreSQL, Redis, Storage, and other services.
# Identity
business_unit: "<your_business_unit>"
tenant_id: "<your_azure_tenant_id>"
default_tags:
costcenter: "<cost_center>"
product: "<domain_code>"
deploymenttype: "new"
safe-to-delete: "no"
# CMK key names (Terraform creates these keys when cmk.create=true in the env YAML)
cmk:
create: true
encryption_key_name: "encryptionCMK"
k8s_key_name: "k8sEncryptionCMK"
service_bus_key_name: "serviceBusEncryptionCMK"
# Key Vault network posture — secure-by-default baseline.
# Override public_network_access_enabled and network_acls_ip_rules in your env YAML
# if you need temporary public access during bootstrap from a laptop or CI runner.
keyvault:
public_network_access_enabled: false
enable_rbac_authorization: true
sku_name: "standard"
soft_delete_retention_days: 90 # Cannot be changed after vault creation
purge_protection_enabled: true
network_acls_default_action: "Deny"
network_acls_bypass: "AzureServices"
network_acls_ip_rules: []
# PostgreSQL baseline
psql:
public_network_access_enabled: false
sku_name: "GP_Standard_D2s_v3"
psql_version: "15"
backup_retention_days: 7
bootstrap_aad_principals: false
auth:
active_directory_auth_enabled: true
password_auth_enabled: true
storage_mb: 32768
storage_tier: "P10"
auto_grow_enabled: true
azure_extensions: "uuid-ossp,ltree"
# Redis baseline
redis:
public_network_access_enabled: false
sku_name: "Premium"
family: "P"
capacity: 1
active_directory_authentication_enabled: true
access_policy_name: "Data Owner"
non_ssl_port_enabled: false
tls_version: 1.2
redis_version: 6
# Storage account baseline
storage_account:
public_network_access_enabled: false
min_tls_version: "TLS1_2"
account_tier: "Standard"
account_replication_type: "LRS"
allow_nested_items_to_be_public: false
# OpenAI — leave client.mode empty to let Terraform infer from open_ai.create
open_ai:
client:
mode: ""
key: ""
org_id: ""
custom_url: ""
# NAT Gateway baseline
nat_gateway:
sku_name: "Standard"
public_ip_sku: "Standard"
public_ip_allocation_method: "Static"
zones: ["1", "2", "3"]
idle_timeout_in_minutes: 10
# AKS baseline
aks:
load_balancer_sku: "standard"
only_critical_addons_enabled: true
key_vault_secret_rotation_enabled: true
# Front Door baseline
frontdoor:
create: true
ssl_mode: managed
minimum_tls_version: "TLS12"
forwarding_protocol: HttpOnly
private_link_service_name: "sgp-ingress-lb"
subdomains: ["auth", "api", "@", "admin"]
extra_subdomains: []
dns_ttl_seconds: 60
ip_filtering:
enabled: false
allowed_ip_ranges: []
# System Manager bootstrapping baseline
bootstrapping:
az_cli_version: "2.59.0"
helm_chart_version: "2.1.0"
base_repository: "<your_container_registry_url>"
File 3: config/<domain_code>/<location>/<environment_type>.yaml
Environment-specific configuration. This is where you set everything that differs per environment: resource names, subscription, CIDRs, feature flags, node pool sizing, and bootstrapping state.
# ── Environment identity ───────────────────────────────────────────────────────
location: eastus2
subscription_id: "<your_azure_subscription_id>"
name_suffix: "<workspace_id>" # e.g. "acme01"
deployment_id: "<workspace_id>"
deployment_url: "<workspace_id>.workspace.egp.scale.com"
temporal_db_mode: postgresCommonHosted # Recommended; or "cassandraK8s" (not recommended)
# ── Auth / secrets ────────────────────────────────────────────────────────────
authType: default # Options: default, SAML, OIDC
RBAC: false
registration_secret: "" # Auto-generated if empty
identity_service_jwt: "" # Auto-generated if empty
# ── OIDC (required only when authType = OIDC) ─────────────────────────────────
oidc:
clientId: ""
clientSecret: ""
issuer: ""
authorizationUrl: ""
tokenUrl: ""
userInfoUrl: ""
# ── SAML (required only when authType = SAML) ─────────────────────────────────
saml:
X509Cert: ""
SSOUrl: ""
emailAttrName: ""
firstNameAttrName: ""
lastNameAttrName: ""
# ── SSL ───────────────────────────────────────────────────────────────────────
sslMode: e2e # Options: e2e, managed
# SSL cert/key for Istio ingress (PEM file paths or base64). Required when sslMode=e2e.
istio:
sslCert: "<path_to_pem_certificate>"
sslKey: "<path_to_pem_private_key>"
# ── Resource group + networking ───────────────────────────────────────────────
resource_group: "rg-eus2-<domain_code>-<workspace_id>-dev-01"
network:
vnet: "sgpaz<workspace_id>-vnet"
int_rt_name: "sgpaz<workspace_id>-rt-int"
int_nsg_name: "sgpaz<workspace_id>-nsg-int"
private_cluster: true
nsg_rules:
postgresql: true
redis: true
private_dns:
create: true
private_cluster: true
subnets:
psql:
name: "sgpaz<workspace_id>-snet-psql"
type: int
address_prefixes:
- "<cidr_block>" # e.g. "10.95.23.160/27"
service_endpoints:
- Microsoft.Storage
service_delegation:
name: Microsoft.DBforPostgreSQL/flexibleServers
actions:
- Microsoft.Network/virtualNetworks/subnets/join/action
deployment_scripts:
name: "sgpaz<workspace_id>-snet-deployscripts"
type: int
address_prefixes:
- "<cidr_block>"
service_delegation:
name: Microsoft.ContainerInstance/containerGroups
actions:
- Microsoft.Network/virtualNetworks/subnets/action
aks:
name: "sgpaz<workspace_id>-snet-aks"
type: int
address_prefixes:
- "<cidr_block>"
private_endpoint_network_policies: Disabled
service_endpoints:
- Microsoft.CognitiveServices
bastion:
name: AzureBastionSubnet
type: int
address_prefixes:
- "<cidr_block>"
jump_host:
name: "sgpaz<workspace_id>-snet-jumphost"
type: int
address_prefixes:
- "<cidr_block>"
# ── Workload Identity + NAT Gateway ───────────────────────────────────────────
workload_identities:
enabled: true
nat_gateway:
enabled: true
name: "sgpaz<workspace_id>nat"
zones: [] # [] = regional (no zone pinning)
# ── CMK ───────────────────────────────────────────────────────────────────────
# Set create=true for new deployments. If CMK keys already exist, set create=false
# and provide the key URIs to avoid "already exists (import required)" errors.
cmk:
enabled: true
create: true
# encryption_key_uri: "https://<keyvault>.vault.azure.net/keys/encryptionCMK/<version>"
# k8s_key_uri: "https://<keyvault>.vault.azure.net/keys/k8sEncryptionCMK/<version>"
# service_bus_key_uri: "https://<keyvault>.vault.azure.net/keys/serviceBusEncryptionCMK/<version>"
# ── Key Vault ─────────────────────────────────────────────────────────────────
# Override public access settings here if bootstrapping from a laptop or CI runner.
keyvault:
name: "sgpaz<workspace_id>keyvault" # Must be globally unique; 3-24 characters
soft_delete_retention_days: 90 # Must match existing vault if already created
public_network_access_enabled: true # Set false once stable; keep true during bootstrap
network_acls_ip_rules:
- "<your_public_ip>/32" # Your laptop or CI runner IP
# ── Core services ─────────────────────────────────────────────────────────────
law:
create: true
psql:
create: true
name: "sgpaz<workspace_id>postgres" # Must be globally unique
admin_user: postgres
bootstrap_aad_principals: true # Runs once after cluster is up; requires kubectl access
redis:
create: true
name: "sgpaz<workspace_id>redis" # Must be globally unique
storage_account:
create: true
name: "sgpaz<workspace_id>storage" # Must be globally unique; no hyphens
enable_file_private_endpoint: true
service_bus:
create: true
name: "sgpaz<workspace_id>servicebus"
sku: Premium # Premium required for private endpoints
capacity: 1
premium_messaging_partitions: 1
private_endpoint_enabled: true
public_network_access: false
# ── AI services ───────────────────────────────────────────────────────────────
# Modes: use_openai_via_azure | use_openai_via_custom_endpoint | no_openai
# NOTE: Do not commit real API keys. Keep `key` as a local uncommitted change.
open_ai:
create: false
name: "sgpaz<workspace_id>openai"
client:
mode: "use_openai_via_custom_endpoint"
key: "<openai_api_key>"
custom_url: "<your_azure_openai_endpoint>"
ai_search:
create: false
name: "sgpaz<workspace_id>aisearch"
# ── AKS ───────────────────────────────────────────────────────────────────────
aks:
create: true
name: "sgpaz<workspace_id>aks"
node_resource_group: "rg-aks-eus2-<domain_code>-<workspace_id>-dev-01"
dns_prefix: "egp-k8scluster"
sku: Free # Free | Standard (use Standard for production)
kubernetes_version: "1.30"
run_command_enabled: true
enable_encryption_at_host: true
use_azure_managed_flux: true
zones: ["1", "2", "3"]
istio:
enabled: true
ingress_mode: External
revisions:
- asm-1-20
node_pools:
system:
name: default
vm_size: Standard_D4s_v3
count: 3
min_count: 3
max_count: 6
user:
enabled: true
name: user
vm_size: Standard_D16s_v3
count: 3
min_count: 3
max_count: 10
gpu:
enabled: false # Set true to enable GPU workloads
name: gpu
vm_size: Standard_NV72ads_A10_v5
min_count: 0
max_count: 5
cassandra:
enabled: false # Required only if temporal_db_mode is cassandraK8s
network:
pod_cidr: "10.244.0.0/16"
service_cidr: "10.243.0.0/16"
dns_service_ip: "10.243.0.10"
outbound_type: userDefinedRouting
# ── Bastion / jump host ───────────────────────────────────────────────────────
bastion:
create: true
admin_username: azureuser
ssh_public_key: "<your_ssh_public_key>"
# ── Front Door overrides ──────────────────────────────────────────────────────
# Most Front Door settings inherit from default.yaml. Override only what differs.
frontdoor:
extra_subdomains: [] # e.g. ["chat"] if Agentex UI is enabled
# ── Feature flags ─────────────────────────────────────────────────────────────
feature_flags:
models: false
agentex:
create: false # Set true to provision Agentex service
compass:
create: false # Set true to provision Workflows service
dex:
create: false # Set true to provision Dex (Document Understanding) service
reducto:
create: false # Set true to provision Reducto service
# ── Bootstrapping (System Manager) ────────────────────────────────────────────
bootstrapping:
enabled: true
use_managed_flux: true
system_manager_version: "<system_manager_image_tag>"
desiredState: |
{
"version": "0.1",
"packs": [
{ "name": "cert-manager" },
{ "name": "egp" },
{ "name": "identity-service" },
{ "name": "spicedb" },
{ "name": "sgp-apps" },
{ "name": "sgp-models" }
]
}
# ── Policy assignments ────────────────────────────────────────────────────────
policy_assignments:
enabled: false # Set true to enforce Azure Policy tag rules
# ── Observability ─────────────────────────────────────────────────────────────
observiqidp:
monitoring:
enabled: false
insights:
enabled: false
Naming constraints for Azure resources:
- Key Vault names: 3–24 alphanumeric characters and hyphens, globally unique
- Storage Account names: 3–24 lowercase letters and numbers only (no hyphens), globally unique
- PostgreSQL and Redis names: globally unique within Azure
- All names must remain stable after first apply — many Azure resources cannot be renamed
Navigate to the infra directory and initialize:
Review and apply the plan using the main.tfvars.json file you configured in the previous step:
terraform plan -var-file=main.tfvars.json -out=tfplan
# Inspect the plan to review resources before proceeding
terraform show tfplan
terraform apply tfplan
If the Datadog monitoring integration is enabled in your YAML, pass the API key via environment variable rather than committing it to file:
TF_VAR_datadog_api_key="<your_key>" terraform plan -var-file=main.tfvars.json -out=tfplan
This step may take significant time (30–60 minutes) due to resource creation dependencies, particularly the AKS cluster and PostgreSQL Flexible Server.
Step 3: Bootstrap the Cluster
The Azure infrastructure automatically bootstraps SGP System Manager via an Azure Deployment Script. When bootstrapping.enabled: true is set in your configuration, Terraform provisions an Azure Container Instance that:
- Installs Flux CD on the AKS cluster (using Azure Managed Flux if
use_managed_flux: true)
- Applies the System Manager
HelmRepository and HelmRelease Flux CRDs
- Waits for System Manager to reconcile
Monitor bootstrap progress in the Azure Portal under Deployment Scripts in your resource group, or check System Manager logs after bootstrap:
# Get AKS credentials (from inside the VNet or via Bastion — cluster is private by default)
az aks get-credentials \
--resource-group <resource_group> \
--name sgpaz<workspace_id>aks \
--overwrite-existing
# Verify System Manager is running
kubectl get pods -n sgp-system-manager
# Watch System Manager deploy remaining services via Flux
kubectl get helmreleases -A
Accessing the Private AKS Cluster
Because the AKS cluster is private by default, you must access it from within the provisioned VNet. Two options are provided:
Option A: From the jump host VM (via Azure Bastion)
The Bastion host and jump host VM are provisioned when bastion.create: true. Connect via the Azure Portal (Bastion blade) or using the helper script:
python3 scripts/connect_private_aks.py --run-mode=local --bootstrap-jump-host
Option B: Using az aks command invoke
When aks.run_command_enabled: true, you can run kubectl commands without VPN access:
az aks command invoke \
--resource-group <resource_group> \
--name sgpaz<workspace_id>aks \
--command "kubectl get helmreleases -A"
After terraform apply completes, retrieve the Front Door endpoint hostname:
terraform output frontdoor_endpoint_host_name
terraform output frontdoor_dns_zone_id
Configure a CNAME record in your DNS provider pointing your deployment_url to the Front Door endpoint (the azurefd.net hostname from the output above).
If using Azure DNS (the DNS zone is managed by Terraform when frontdoor.create: true), CNAME records are created automatically. Verify with:az network dns record-set list \
--resource-group <resource_group> \
--zone-name <deployment_url>
Step 5: Verify the Deployment
Wait for all services to be ready:
kubectl get helmreleases -A # All should show Ready=True
kubectl get pods -A # All should be Running or Completed
System Manager continuously reconciles the desired state. The bootstrapping.desiredState value in your environment YAML is written to a secret in Azure Key Vault during terraform apply, and System Manager reads from that secret at runtime. If a HelmRelease shows Ready=False, check its events:
kubectl describe helmrelease <name> -n <namespace>
SAML Configuration
Set authType: "SAML" in your environment YAML, then configure your Identity Provider with:
- Service Entity ID:
https://auth.<deployment_url>
- Redirect URI:
https://auth.<deployment_url>/dashboard/org/saml/callback
Update the is-saml-secret secret in Key Vault (or via System Manager GUI):
{
"id": "<workspace_id>",
"samlConfiguration": {
"entityId": "<entity_id_from_your_identity_provider>",
"x509Cert": "<x509_certificate_from_your_identity_provider>",
"ssoUrl": "<sso_url_from_your_identity_provider>",
"attributeMappings": {
"email": "<email_attribute_name>",
"firstName": "<first_name_attribute_name>",
"lastName": "<last_name_attribute_name>"
}
}
}
OIDC Configuration
Set authType: "OIDC" in your environment YAML, then configure your Identity Provider with:
- Redirect URI:
https://auth.<deployment_url>/dashboard/org/oidc/callback
Update the is-oidc-secret secret in Key Vault (or via System Manager GUI):
{
"id": "<workspace_id>",
"oidcConfiguration": {
"clientId": "<client_id_from_your_identity_provider>",
"clientSecret": "<client_secret_from_your_identity_provider>",
"issuer": "<issuer_from_your_identity_provider>",
"authorizationUrl": "<authorization_url>",
"tokenUrl": "<token_url>",
"userInfoUrl": "<user_info_url>"
}
}
After modifying an identity secret, restart System Manager to apply the changes:
kubectl rollout restart deployment sgp-system-manager -n sgp-system-manager
If all goes smoothly, you should be able to navigate to the SGP platform at https://<workspace_id>.workspace.egp.scale.com (or your custom domain) and authenticate via the configured identity provider.