> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gp.scale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# GCP Architecture Reference

> Overview of the SGP GCP architecture

## Architecture Diagram

```mermaid theme={null}
graph TD
    USER(["Client"])
    ENGINEER(["Engineer"])

    subgraph GCP["GCP Project"]
        GLB["Global Load Balancer<br/>Static IP · TLS 1.2+"]
        IAP["Cloud IAP"]

        subgraph VPC["VPC — 10.0.0.0/16"]
            INGRESS["Istio Ingress"]

            subgraph GKE["GKE Cluster"]
                SYS["System Pool<br/>3× n2-standard-4"]
                CPU["CPU Pool<br/>4–30× n2-standard-16"]
            end

            BASTION["Bastion VM<br/>e2-micro"]
        end

        subgraph PEERED["VPC-Peered — Cloud SQL"]
            PG_MAIN[("Cloud SQL<br/>PostgreSQL 17 (main)")]
            PG_TEMP[("Cloud SQL<br/>PostgreSQL 17 (temporal)")]
        end

        SM["Secret Manager"]

        subgraph GCS_GROUP["Cloud Storage"]
            BUCKET[("GCS — platform")]
            KB[("GCS — knowledge base")]
        end

        AR["Artifact Registry<br/>Docker + Helm"]
        KMS["Cloud KMS<br/>CMEK (optional)"]
    end

    USER --> GLB
    GLB --> INGRESS
    INGRESS --> CPU
    CPU -->|"VPC Peering · SSL enforced"| PG_MAIN
    CPU -->|"VPC Peering · SSL enforced"| PG_TEMP
    CPU -->|"Workload Identity"| SM
    CPU -->|"Workload Identity"| BUCKET
    CPU -->|"Workload Identity"| KB
    AR -->|"Workload Identity"| CPU
    KMS -.->|"encrypts"| PG_MAIN
    KMS -.->|"encrypts"| PG_TEMP
    KMS -.->|"encrypts"| BUCKET
    ENGINEER --> IAP --> BASTION -->|"private API"| GKE
```

***

## Terraform Structure

SGP's GCP infrastructure is provisioned in two separate Terraform phases with different privilege levels:

```mermaid theme={null}
graph LR
    CREDS(["Personal GCP<br/>Credentials"])

    subgraph PS["projectsetup/ — run once with personal credentials"]
        SA["Terraform<br/>Service Account"]
        AR_PS["Artifact Registries<br/>Docker + Helm"]
        SMK["SA Key stored in<br/>Secret Manager"]
    end

    subgraph DEP["deployments/name/ — SA credentials auto-loaded from Secret Manager"]
        GKE_D["GKE Cluster"]
        SQL_D["Cloud SQL × 2"]
        VPC_D["VPC + Subnets"]
        SEC_D["Secrets + IAM<br/>+ Workload Identity"]
    end

    CREDS -->|"terraform apply"| PS
    SMK -->|"credentials<br/>auto-read at plan time"| DEP
```

| Phase                   | Directory             | Credentials                                          | Scope                                                |
| ----------------------- | --------------------- | ---------------------------------------------------- | ---------------------------------------------------- |
| Privileged bootstrap    | `projectsetup/`       | Your personal GCP identity                           | Service account, Artifact Registries, API enablement |
| Deprivileged main infra | `deployments/<name>/` | Terraform service account (read from Secret Manager) | GKE, Cloud SQL, networking, IAM, secrets             |

The main infrastructure Terraform reads the service account key directly from Secret Manager — no key file needs to exist on disk during the infrastructure run.

***

## Resources by Type

### Compute Resources

| Resource                     | Count | Purpose                                    |
| ---------------------------- | ----- | ------------------------------------------ |
| GKE Cluster                  | 1     | Kubernetes orchestration                   |
| System Node Pool (`default`) | 1     | System pods — tainted `CriticalAddonsOnly` |
| CPU Node Pool (`cpu`)        | 1     | Application workloads                      |
| GPU Node Pool (`gpu`)        | 0–1   | AI/ML workloads (optional)                 |
| Cassandra Node Pool          | 0–1   | Temporal database (optional)               |
| Bastion Host                 | 0–1   | Private cluster access via IAP (optional)  |

**Default node pool sizing:**

| Pool               | Machine Type     | Min Nodes | Max Nodes               |
| ------------------ | ---------------- | --------- | ----------------------- |
| System (`default`) | `n2-standard-4`  | 3         | 10                      |
| CPU (`cpu`)        | `n2-standard-16` | 4         | 30                      |
| GPU (`gpu`)        | `a2-highgpu-1g`  | 0         | 0 (disabled by default) |
| Cassandra          | `n2-standard-4`  | 3         | 6                       |

All pools use `COS_CONTAINERD` image type and are preemptible by default (configurable via `node_pool_config.preemptible`).

***

### Network Resources

| Resource            | Count | Purpose                             |
| ------------------- | ----- | ----------------------------------- |
| VPC Network         | 1     | Network boundary                    |
| Subnetwork          | 1     | Kubernetes nodes                    |
| Secondary IP Ranges | 2     | GKE pod and service CIDRs           |
| Cloud DNS Zone      | 1     | Internal + external name resolution |
| Global Static IP    | 1     | Load balancer ingress endpoint      |
| SSL Policy          | 1     | Minimum TLS 1.2 enforcement         |
| Firewall Rules      | 4–5   | Traffic control (offline mode)      |
| VPC Peering         | 1     | Private connectivity to Cloud SQL   |
| Private Route       | 0–1   | Google APIs access in offline mode  |

***

### Data & Storage Resources

| Resource                      | Count | Purpose                       |
| ----------------------------- | ----- | ----------------------------- |
| Cloud SQL (PostgreSQL 17)     | 1     | Main platform database        |
| Cloud SQL (PostgreSQL 17)     | 1     | Temporal workflow database    |
| GCS Bucket (main)             | 1     | Platform object storage       |
| GCS Bucket (knowledge base)   | 1     | KB document storage           |
| GCS Bucket (monitoring)       | 0–1   | Observability data (optional) |
| Vertex AI Vector Search Index | 0–N   | Vector embeddings (optional)  |
| Cloud Firestore               | 0–1   | Agentex state (optional)      |

Both Cloud SQL instances are private-only (no public IP) and connected to the VPC via VPC peering. SSL is enforced for all database connections (`ENCRYPTED_ONLY`).

***

### Security Resources

| Resource                                                 | Count | Purpose                                     |
| -------------------------------------------------------- | ----- | ------------------------------------------- |
| Secret Manager Secrets                                   | 10+   | Platform configuration and credentials      |
| Cloud KMS Key Ring + Key                                 | 0–1   | Customer Managed Encryption Keys (optional) |
| Service Account (`sgp-<workspace_id>-sa`)                | 1     | Main SGP workload identity SA               |
| Service Account (`sgp-<workspace_id>-node-pool-creator`) | 1     | Node pool creation SA                       |
| Service Account (`sgp-tf-lp-<name>`)                     | 1     | Terraform execution SA (projectsetup)       |
| Workload Identity Pool                                   | 1     | GKE pod → GCP SA binding                    |
| IAP Tunnel (Bastion)                                     | 0–1   | Private cluster access                      |

***

### Artifact Resources (projectsetup phase)

| Resource                   | Count | Purpose              |
| -------------------------- | ----- | -------------------- |
| Artifact Registry (Docker) | 1     | SGP container images |
| Artifact Registry (Helm)   | 1     | SGP Helm charts      |

Both registries are named `sgp-<workspace_id>-docker-repository` and `sgp-<workspace_id>-helm-repository`.

***

### Monitoring Resources

| Resource                   | Count | Purpose                                      |
| -------------------------- | ----- | -------------------------------------------- |
| Cloud Logging              | 1     | GKE system components, API server, workloads |
| VPC Flow Logs              | 0–1   | Network traffic sampling (optional)          |
| GKE Vulnerability Scanning | 1     | Basic vulnerability mode enabled by default  |

***

## Network Architecture

### Address Space

```
VPC: sgp-<workspace_id>-network
└── Subnet: sgp-<workspace_id>-network-kubernetes-subnet
    ├── Primary range:   10.0.0.0/16   (65,536 IPs — GKE nodes)
    ├── Secondary range: 10.2.0.0/16   (65,536 IPs — Kubernetes services)
    └── Secondary range: 10.4.0.0/16   (65,536 IPs — Kubernetes pods)

Cloud SQL VPC Peering:
    └── Reserved range:  /16 block      (managed by service networking)

GKE Control Plane (private):
    └── Master CIDR:     10.5.0.0/28   (configurable via private_gke_master_ipv4_cidr_block)
```

### Traffic Flow

**Ingress (External):**

```
Internet → Global Load Balancer (static IP) → Istio Ingress → Pod
(TLS terminated at load balancer; SSL policy enforces TLS 1.2+)
```

**Database Access:**

```
GKE Pod → VPC Peering → Cloud SQL (private IP)
(SSL enforced; no public IP on Cloud SQL instances)
```

**Google APIs (offline mode):**

```
GKE Pod → Private Google Access → Google APIs (199.36.153.4/30)
(Dedicated route; no default internet gateway route created)
```

**Bastion Access:**

```
Engineer → IAP Tunnel → Bastion VM (e2-micro) → GKE API (private endpoint)
```

***

## GKE Configuration

### Cluster Features

| Feature                    | Value                                        |
| -------------------------- | -------------------------------------------- |
| Datapath provider          | `ADVANCED_DATAPATH` (eBPF-based)             |
| IP stack                   | Dual-stack IPv4/IPv6                         |
| Workload Identity          | Enabled (`<project>.svc.id.goog`)            |
| Secret Manager integration | Enabled                                      |
| Vulnerability scanning     | Basic mode                                   |
| Private nodes              | Enabled (when `offline_mode = true`)         |
| Private endpoint           | Configurable (`enable_gke_private_endpoint`) |
| Master authorized networks | Configurable per deployment                  |
| DNS endpoint               | Enabled (allows external cluster DNS access) |

### Workload Identity

GKE pods authenticate to GCP services using Workload Identity rather than node-level service account keys. Kubernetes service accounts are bound to GCP service accounts via the workload identity pool:

```
<project>.svc.id.goog[<namespace>/<k8s-service-account>]
  → GCP Service Account
    → Secret Manager, GCS, Artifact Registry, etc.
```

Key bindings provisioned by Terraform:

| Kubernetes Identity                             | GCP Role                                                        |
| ----------------------------------------------- | --------------------------------------------------------------- |
| `sgp-system-manager` (system-manager namespace) | `secretmanager.viewer`, `secretmanager.secretAccessor`          |
| `sgp-system-manager-pre-install`                | `secretmanager.viewer`, `secretmanager.secretAccessor`          |
| `egp-api-backend` (egp namespace)               | `secretmanager.secretAccessor`, GCS access via SA impersonation |
| `egp-api-backend-db-setup`                      | `secretmanager.secretAccessor`                                  |
| `agents` (agents namespace)                     | `secretmanager.secretAccessor`                                  |

***

## Secret Manager Secrets

Key secrets provisioned by Terraform:

| Secret Name                             | Contents                                                                       |
| --------------------------------------- | ------------------------------------------------------------------------------ |
| `<prefix>-system-manager-config`        | System Manager runtime configuration (URLs, workspace ID, cloud provider info) |
| `<prefix>-system-manager-desired-state` | Initial desired state (pack list) — managed externally after first apply       |
| `<prefix>-saml-config-secret`           | SAML IdP configuration                                                         |
| `<prefix>-oidc-config-secret`           | OIDC IdP configuration                                                         |
| `terraform-service-account-key-secret`  | Terraform SA private key (used by `deployments/` as provider credentials)      |

The `secret_name_prefix` variable (typically `sgp-<workspace_id>`) scopes secrets for deployments in shared GCP projects.

***

## Customer Managed Encryption Keys (CMEK)

When `useCustomerManagedEncryptionKey = true`, Terraform provisions a Cloud KMS key ring and symmetric encryption key:

| Resource   | Name                            |
| ---------- | ------------------------------- |
| Key Ring   | `sgp-<workspace_id>-key-ring`   |
| Crypto Key | `sgp-<workspace_id>-crypto-key` |

The key is applied to:

* Cloud SQL instances (both main and Temporal)
* GCS buckets

CMEK is recommended for production deployments to maintain cryptographic control over data at rest.

***

## Optional Capabilities

Each optional capability provisions dedicated infrastructure:

| Capability                   | Variable                           | Additional Resources                                                        |
| ---------------------------- | ---------------------------------- | --------------------------------------------------------------------------- |
| Agentex                      | `deployAgentex`                    | Firestore, dedicated service account, GCS bucket                            |
| Workflows                    | `deployCompass`                    | Cloud SQL database, GCS bucket, service account                             |
| Dex (Document Understanding) | `enable_dex`                       | Cloud SQL (`db-custom-4-15360`), GCS bucket, service account                |
| Reducto                      | `enable_reducto`                   | Cloud SQL (`db-custom-2-7680`), GCS bucket, Vision API key, service account |
| Model Engine                 | `deployModelEngine`                | Vertex AI, additional node pools                                            |
| Monitoring                   | `enable_monitoring`                | GCS bucket, IAM bindings                                                    |
| Cloud Build                  | `enable_cloud_build`               | GCS bucket, Artifact Registry, service accounts                             |
| Vertex AI Search             | `vertex_ai_indices`                | Vector Search indices and endpoints                                         |
| LiveKit                      | `provision_livekit_infrastructure` | GCS bucket (audio), IAM bindings                                            |
