Current Implementation — Demo

Below is the demo-oriented AWS setup we will use when deploying into the customer’s AWS account.
It includes a minimal architecture, detailed IAM roles (with full JSON policies and steps), and an order-of-magnitude cost.
Everything here is written so your team can implement it directly.


AWS Architecture (Minimal)

Account & Access

  • We deploy into the customer’s AWS account by assuming customer-provisioned IAM roles (cross-account STS:AssumeRole with ExternalId).
  • Region: ap-northeast-1 (Tokyo), unless otherwise requested.

Network

  • One VPC (demo grade), single AZ to keep costs small.
  • Public subnets only (to avoid NAT Gateway costs). Tight Security Groups to restrict inbound/outbound.

EKS (Kubernetes)

  • One EKS cluster with a single managed node (e.g., t4g.small; use t3.small if ARM is not available).
  • Ingress via AWS Load Balancer ControllerALB with ACM TLS certificate.
  • Kubernetes add-ons kept to a minimum (no Prometheus/Grafana for the demo unless asked).

Application Components

  • Portal API as a Deployment (Phase J readiness/liveness/startup probes enabled).
  • Chroma as a StatefulSet with EBS gp3 (≈ 50 GB) for persistence.
  • Cron layer (Phase K) as K8s CronJobs that call our API stages in sequence: extract → translate → package → upsert.

Database

  • Amazon RDS for PostgreSQL (Single-AZ), small tier (db.t4g.micro or db.t3.micro), 20–30 GB gp3, short backup retention.

Registry / DNS / Certs / Logs

  • ECR for images (portal-api and cron-runner).
  • Route 53 for DNS A/ALIAS to ALB.
  • ACM for TLS certificates (ALB termination).
  • CloudWatch Logs for pod stdout/stderr (API audit logs include X-Request-ID propagated from Phase J).

Nightly cost saver (optional for demo)

  • EventBridge Scheduler + 2 Lambdas:
    • Night (20:05 JST): Stop RDS and scale EKS nodegroup to 0.
    • Morning (07:45 JST): Start RDS (wait for available) and scale nodegroup to 1.
    • Effect: EKS control plane & ALB still bill 24×7, but EC2 node + RDS runtime costs are reduced.

AWS Roles & Permissions (Minimal) — Create these in the customer account

We assume the following three roles exist in the customer account.
All have a Trust Policy allowing our vendor account to sts:AssumeRole with a shared ExternalId.
We strongly recommend least privilege and, where applicable, permission boundaries or resource-limited ARNs.

0) Trust Policy (attach to each assumable role)

Replace placeholders (<vendor-account-id>, <shared-external-id>) and attach this as the Trust Policy for every vendor role (A/B/C). Optionally restrict by aws:PrincipalArn if you want to allow only a specific role/user from our side.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::<vendor-account-id>:root" },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": { "sts:ExternalId": "<shared-external-id>" }
      }
    }
  ]
}

Session duration: You can set maxSessionDuration (e.g., 1–4 hours).
Permission Boundaries: If your org requires them, attach a boundary policy to cap rights.


A) VendorRepoPusherECR push/pull from our CI

Purpose: Allow our CI to push images for portal-api and cron-runner (and pull for verification).

Policy scope guidelines

  • Restrict to repositories whose names start with a known prefix, e.g., portal-:
    • arn:aws:ecr:<region>:<account-id>:repository/portal-*
  • If we must create repositories, include ecr:CreateRepository. Otherwise omit it and pre-create repos.

IAM Policy JSON (attach to role)

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": ["ecr:GetAuthorizationToken"], "Resource": "*" },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:CompleteLayerUpload",
        "ecr:InitiateLayerUpload",
        "ecr:PutImage",
        "ecr:UploadLayerPart",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer",
        "ecr:DescribeRepositories"
      ],
      "Resource": "arn:aws:ecr:<region>:<account-id>:repository/portal-*"
    }
  ]
}

(Optional) allow creating repos

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": ["ecr:GetAuthorizationToken"], "Resource": "*" },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:CreateRepository",
        "ecr:BatchCheckLayerAvailability",
        "ecr:CompleteLayerUpload",
        "ecr:InitiateLayerUpload",
        "ecr:PutImage",
        "ecr:UploadLayerPart",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer",
        "ecr:DescribeRepositories"
      ],
      "Resource": "*"
    }
  ]
}

Validation steps (customer or CI can run)

  1. From our CI, assume the role and get a temporary session.
  2. aws ecr get-login-password --region ap-northeast-1 | docker login --username AWS --password-stdin <acct>.dkr.ecr.ap-northeast-1.amazonaws.com
  3. docker build -t portal-api:demo . then docker tag and docker push <acct>.dkr.ecr.ap-northeast-1.amazonaws.com/portal-api:<tag>
  4. Expect push success; if AccessDenied, verify role, trust policy, and repo ARN.

B) VendorEKSDeployerkubectl apply into a limited Namespace

Purpose: Let us deploy only to a specific Namespace (e.g., portal) in the customer’s EKS cluster.

AWS-side IAM (minimal)

  • We only need eks:DescribeCluster to fetch the cluster endpoint and certificate.
  • Restrict the resource to the target cluster ARN.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["eks:DescribeCluster"],
      "Resource": "arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>"
    }
  ]
}

Kubernetes-side mapping (performed by the customer’s EKS admins)

  1. Map the IAM role to a Kubernetes group via the aws-auth ConfigMap (cluster-admin task).

Example aws-auth addition (illustrative; don’t overwrite existing mappings):

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::<account-id>:role/VendorEKSDeployer
      username: vendor-deployer
      groups:
        - vendor:portal:deployer
  1. Grant that group admin within only the target Namespace (portal):
# Create Namespace if not present
apiVersion: v1
kind: Namespace
metadata:
  name: portal
---
# Bind the mapped group to Namespace admin
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: vendor-namespace-admin
  namespace: portal
subjects:
  - kind: Group
    name: vendor:portal:deployer   # must match the group in aws-auth
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: admin

Notes

  • Do not grant cluster-wide admin. Namespace scoping is the principle here.
  • If you prefer a tighter Role (instead of admin), define a custom Role with create/update/patch/get/list/watch on the K8s object kinds we deploy (Deployments, StatefulSets, Services, Ingresses, CronJobs, ConfigMaps, Secrets, HPA, RBAC objects we own), then bind that instead of admin.

Validation steps

  • With the assumed role, run: aws eks update-kubeconfig --name <cluster-name> --region ap-northeast-1 --role-arn arn:aws:iam::<account-id>:role/VendorEKSDeployer
  • kubectl -n portal get deploy should work; kubectl -n kube-system get pods should fail with RBAC 403 (expected).

C) (Optional) NightSwitchLambdaRoleNightly stop/start (RDS + NodeGroup)

Purpose: Used only if we enable the demo nightly cost saver. Grants a Lambda function enough rights to stop/start RDS and scale the EKS nodegroup.

IAM Policy JSON (attach to the Lambda’s execution role)

{
  "Version":"2012-10-17",
  "Statement":[
    {
      "Effect":"Allow",
      "Action":[ "logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents" ],
      "Resource":"*"
    },
    {
      "Effect":"Allow",
      "Action":[ "rds:StopDBInstance","rds:StartDBInstance","rds:DescribeDBInstances" ],
      "Resource":"arn:aws:rds:<region>:<account-id>:db:<db-instance-id>"
    },
    {
      "Effect":"Allow",
      "Action":[ "eks:UpdateNodegroupConfig","eks:DescribeNodegroup","eks:DescribeCluster" ],
      "Resource":[
        "arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>",
        "arn:aws:eks:<region>:<account-id>:nodegroup/<cluster-name>/<nodegroup-name>/*"
      ]
    }
  ]
}

Environment variables for the Lambda code

  • AWS_REGION=ap-northeast-1
  • DB_INSTANCE_IDENTIFIER=<db-instance-id>
  • EKS_CLUSTER_NAME=<cluster-name>
  • EKS_NODEGROUP_NAME=<nodegroup-name>

EventBridge Scheduler

  • Night stop (UTC cron that maps to 20:05 JST): cron(5 11 * * ? *)
  • Morning start (UTC cron that maps to 07:45 JST): cron(45 22 * * ? *)

Estimated Monthly Cost (Order of Magnitude)

Assumptions: Tokyo, on-demand, ~730 h/month, very low traffic.

  • EKS control plane: ~$73/mo
  • EC2 node (t4g.small ×1): ~$12/mo
  • ALB: base ~$16–18/mo (+ a few dollars for LCU if idle/low traffic)
  • RDS (db.t4g.micro, Single-AZ): ~$12/mo
  • EBS gp3 (Chroma 50 GB): ~$4–5/mo
  • RDS storage (20 GB): ~$2–3/mo

Total (always on): ~$120–$130 / month
With nightly stop (nodes + RDS off outside 09:00–20:00 JST): ~$108–$115 / month

Savings are modest because EKS control plane & ALB are billed 24×7.


For Reference — Actual Customer Use (Production-ish)

Same overall structure as the demo; differences below are what we recommend when moving beyond a demo.

Architecture — Differences

  • Multi-AZ subnets and 3× nodes for EKS (e.g., m6g.large or m6i.large), enable HPA/PDB, and NetworkPolicy (Calico/Cilium).
  • RDS PostgreSQL Multi-AZ (e.g., db.m6g.large) with PITR and larger storage.
  • Chroma: Still on EKS, larger gp3 (e.g., 200 GB), scheduled snapshots/Velero backups.
  • Ingress: ALB + WAF (and optionally AWS Shield).
  • Private subnets + NAT Gateways (per AZ); add VPC endpoints (S3, DynamoDB) to reduce NAT data costs.
  • Secrets via AWS Secrets Manager + External Secrets Operator (IRSA).
  • Auth: Managed IdP (Cognito or corporate OIDC) for JWT, OAuth2 client-credentials for Cron.
  • Observability: CloudWatch + optionally AMP/AMG (Prometheus/Grafana).
  • GitOps: Argo CD/Flux; optionally Argo Rollouts for progressive delivery.

Roles — Differences

  • Add VendorIRSASetup role (limited to /eksirsa/*) to create/attach IRSA roles for ALB Controller, External Secrets, etc.
  • External Secrets’ IRSA permissions restricted to specific Secrets Manager ARNs.
  • If we manage DNS/ACM: add Route 53 and ACM permissions restricted to specific hosted zones and region.

Cost — Differences

  • With 3× application nodes, RDS Multi-AZ, WAF, NAT x3, and 200 GB EBS: expect ~$600/month ± as a starting point.
  • Real costs vary with traffic (ALB LCUs), NAT data, and instance sizes; a precise quote needs the AWS Pricing Calculator.

Hand-off checklist for the customer’s IAM/admin team (applies to both demo and prod)

  1. Create the three roles above (VendorRepoPusher, VendorEKSDeployer, (optional) NightSwitchLambdaRole) with the Trust Policy (ExternalId) and the exact JSON policies shown.
  2. In EKS, update aws-auth to map VendorEKSDeployer to a Namespace-scoped group and bind it to admin (or a custom least-privilege Role) in the portal Namespace only.
  3. Create ECR repos named portal-api and cron-runner (or any portal-* prefix) and grant VendorRepoPusher push rights.
  4. (Optional) Set up the two Lambda functions and EventBridge schedules for nightly stop/start.
  5. Share the role ARNs, ExternalId, cluster name, repo names, and DB instance id with us.

Comments

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です