Below is the demo-oriented AWS setup we will use when deploying into the customer’s AWS account.
It includes a minimal architecture, detailed IAM roles (with full JSON policies and steps), and an order-of-magnitude cost.
Everything here is written so your team can implement it directly.
AWS Architecture (Minimal)
Account & Access
- We deploy into the customer’s AWS account by assuming customer-provisioned IAM roles (cross-account STS:AssumeRole with ExternalId).
- Region:
ap-northeast-1(Tokyo), unless otherwise requested.
Network
- One VPC (demo grade), single AZ to keep costs small.
- Public subnets only (to avoid NAT Gateway costs). Tight Security Groups to restrict inbound/outbound.
EKS (Kubernetes)
- One EKS cluster with a single managed node (e.g.,
t4g.small; uset3.smallif ARM is not available). - Ingress via AWS Load Balancer Controller → ALB with ACM TLS certificate.
- Kubernetes add-ons kept to a minimum (no Prometheus/Grafana for the demo unless asked).
Application Components
- Portal API as a Deployment (Phase J readiness/liveness/startup probes enabled).
- Chroma as a StatefulSet with EBS gp3 (≈ 50 GB) for persistence.
- Cron layer (Phase K) as K8s CronJobs that call our API stages in sequence:
extract → translate → package → upsert.
Database
- Amazon RDS for PostgreSQL (Single-AZ), small tier (
db.t4g.microordb.t3.micro), 20–30 GB gp3, short backup retention.
Registry / DNS / Certs / Logs
- ECR for images (portal-api and cron-runner).
- Route 53 for DNS A/ALIAS to ALB.
- ACM for TLS certificates (ALB termination).
- CloudWatch Logs for pod stdout/stderr (API audit logs include
X-Request-IDpropagated from Phase J).
Nightly cost saver (optional for demo)
- EventBridge Scheduler + 2 Lambdas:
- Night (20:05 JST): Stop RDS and scale EKS nodegroup to 0.
- Morning (07:45 JST): Start RDS (wait for
available) and scale nodegroup to 1. - Effect: EKS control plane & ALB still bill 24×7, but EC2 node + RDS runtime costs are reduced.
AWS Roles & Permissions (Minimal) — Create these in the customer account
We assume the following three roles exist in the customer account.
All have a Trust Policy allowing our vendor account to sts:AssumeRole with a shared ExternalId.
We strongly recommend least privilege and, where applicable, permission boundaries or resource-limited ARNs.
0) Trust Policy (attach to each assumable role)
Replace placeholders (<vendor-account-id>, <shared-external-id>) and attach this as the Trust Policy for every vendor role (A/B/C). Optionally restrict by aws:PrincipalArn if you want to allow only a specific role/user from our side.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::<vendor-account-id>:root" },
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": { "sts:ExternalId": "<shared-external-id>" }
}
}
]
}
Session duration: You can set
maxSessionDuration(e.g., 1–4 hours).
Permission Boundaries: If your org requires them, attach a boundary policy to cap rights.
A) VendorRepoPusher — ECR push/pull from our CI
Purpose: Allow our CI to push images for portal-api and cron-runner (and pull for verification).
Policy scope guidelines
- Restrict to repositories whose names start with a known prefix, e.g.,
portal-:arn:aws:ecr:<region>:<account-id>:repository/portal-*
- If we must create repositories, include
ecr:CreateRepository. Otherwise omit it and pre-create repos.
IAM Policy JSON (attach to role)
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Allow", "Action": ["ecr:GetAuthorizationToken"], "Resource": "*" },
{
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:CompleteLayerUpload",
"ecr:InitiateLayerUpload",
"ecr:PutImage",
"ecr:UploadLayerPart",
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer",
"ecr:DescribeRepositories"
],
"Resource": "arn:aws:ecr:<region>:<account-id>:repository/portal-*"
}
]
}
(Optional) allow creating repos
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Allow", "Action": ["ecr:GetAuthorizationToken"], "Resource": "*" },
{
"Effect": "Allow",
"Action": [
"ecr:CreateRepository",
"ecr:BatchCheckLayerAvailability",
"ecr:CompleteLayerUpload",
"ecr:InitiateLayerUpload",
"ecr:PutImage",
"ecr:UploadLayerPart",
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer",
"ecr:DescribeRepositories"
],
"Resource": "*"
}
]
}
Validation steps (customer or CI can run)
- From our CI, assume the role and get a temporary session.
aws ecr get-login-password --region ap-northeast-1 | docker login --username AWS --password-stdin <acct>.dkr.ecr.ap-northeast-1.amazonaws.comdocker build -t portal-api:demo .thendocker taganddocker push <acct>.dkr.ecr.ap-northeast-1.amazonaws.com/portal-api:<tag>- Expect push success; if
AccessDenied, verify role, trust policy, and repo ARN.
B) VendorEKSDeployer — kubectl apply into a limited Namespace
Purpose: Let us deploy only to a specific Namespace (e.g., portal) in the customer’s EKS cluster.
AWS-side IAM (minimal)
- We only need
eks:DescribeClusterto fetch the cluster endpoint and certificate. - Restrict the resource to the target cluster ARN.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["eks:DescribeCluster"],
"Resource": "arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>"
}
]
}
Kubernetes-side mapping (performed by the customer’s EKS admins)
- Map the IAM role to a Kubernetes group via the
aws-authConfigMap (cluster-admin task).
Example aws-auth addition (illustrative; don’t overwrite existing mappings):
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::<account-id>:role/VendorEKSDeployer
username: vendor-deployer
groups:
- vendor:portal:deployer
- Grant that group admin within only the target Namespace (
portal):
# Create Namespace if not present
apiVersion: v1
kind: Namespace
metadata:
name: portal
---
# Bind the mapped group to Namespace admin
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: vendor-namespace-admin
namespace: portal
subjects:
- kind: Group
name: vendor:portal:deployer # must match the group in aws-auth
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: admin
Notes
- Do not grant cluster-wide admin. Namespace scoping is the principle here.
- If you prefer a tighter Role (instead of
admin), define a customRolewithcreate/update/patch/get/list/watchon the K8s object kinds we deploy (Deployments, StatefulSets, Services, Ingresses, CronJobs, ConfigMaps, Secrets, HPA, RBAC objects we own), then bind that instead ofadmin.
Validation steps
- With the assumed role, run:
aws eks update-kubeconfig --name <cluster-name> --region ap-northeast-1 --role-arn arn:aws:iam::<account-id>:role/VendorEKSDeployer kubectl -n portal get deployshould work;kubectl -n kube-system get podsshould fail with RBAC 403 (expected).
C) (Optional) NightSwitchLambdaRole — Nightly stop/start (RDS + NodeGroup)
Purpose: Used only if we enable the demo nightly cost saver. Grants a Lambda function enough rights to stop/start RDS and scale the EKS nodegroup.
IAM Policy JSON (attach to the Lambda’s execution role)
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[ "logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents" ],
"Resource":"*"
},
{
"Effect":"Allow",
"Action":[ "rds:StopDBInstance","rds:StartDBInstance","rds:DescribeDBInstances" ],
"Resource":"arn:aws:rds:<region>:<account-id>:db:<db-instance-id>"
},
{
"Effect":"Allow",
"Action":[ "eks:UpdateNodegroupConfig","eks:DescribeNodegroup","eks:DescribeCluster" ],
"Resource":[
"arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>",
"arn:aws:eks:<region>:<account-id>:nodegroup/<cluster-name>/<nodegroup-name>/*"
]
}
]
}
Environment variables for the Lambda code
AWS_REGION=ap-northeast-1DB_INSTANCE_IDENTIFIER=<db-instance-id>EKS_CLUSTER_NAME=<cluster-name>EKS_NODEGROUP_NAME=<nodegroup-name>
EventBridge Scheduler
- Night stop (UTC cron that maps to 20:05 JST):
cron(5 11 * * ? *) - Morning start (UTC cron that maps to 07:45 JST):
cron(45 22 * * ? *)
Estimated Monthly Cost (Order of Magnitude)
Assumptions: Tokyo, on-demand, ~730 h/month, very low traffic.
- EKS control plane: ~$73/mo
- EC2 node (t4g.small ×1): ~$12/mo
- ALB: base ~$16–18/mo (+ a few dollars for LCU if idle/low traffic)
- RDS (db.t4g.micro, Single-AZ): ~$12/mo
- EBS gp3 (Chroma 50 GB): ~$4–5/mo
- RDS storage (20 GB): ~$2–3/mo
Total (always on): ~$120–$130 / month
With nightly stop (nodes + RDS off outside 09:00–20:00 JST): ~$108–$115 / month
Savings are modest because EKS control plane & ALB are billed 24×7.
For Reference — Actual Customer Use (Production-ish)
Same overall structure as the demo; differences below are what we recommend when moving beyond a demo.
Architecture — Differences
- Multi-AZ subnets and 3× nodes for EKS (e.g.,
m6g.largeorm6i.large), enable HPA/PDB, and NetworkPolicy (Calico/Cilium). - RDS PostgreSQL Multi-AZ (e.g.,
db.m6g.large) with PITR and larger storage. - Chroma: Still on EKS, larger gp3 (e.g., 200 GB), scheduled snapshots/Velero backups.
- Ingress: ALB + WAF (and optionally AWS Shield).
- Private subnets + NAT Gateways (per AZ); add VPC endpoints (S3, DynamoDB) to reduce NAT data costs.
- Secrets via AWS Secrets Manager + External Secrets Operator (IRSA).
- Auth: Managed IdP (Cognito or corporate OIDC) for JWT, OAuth2 client-credentials for Cron.
- Observability: CloudWatch + optionally AMP/AMG (Prometheus/Grafana).
- GitOps: Argo CD/Flux; optionally Argo Rollouts for progressive delivery.
Roles — Differences
- Add VendorIRSASetup role (limited to
/eksirsa/*) to create/attach IRSA roles for ALB Controller, External Secrets, etc. - External Secrets’ IRSA permissions restricted to specific Secrets Manager ARNs.
- If we manage DNS/ACM: add Route 53 and ACM permissions restricted to specific hosted zones and region.
Cost — Differences
- With 3× application nodes, RDS Multi-AZ, WAF, NAT x3, and 200 GB EBS: expect ~$600/month ± as a starting point.
- Real costs vary with traffic (ALB LCUs), NAT data, and instance sizes; a precise quote needs the AWS Pricing Calculator.
Hand-off checklist for the customer’s IAM/admin team (applies to both demo and prod)
- Create the three roles above (VendorRepoPusher, VendorEKSDeployer, (optional) NightSwitchLambdaRole) with the Trust Policy (ExternalId) and the exact JSON policies shown.
- In EKS, update
aws-authto map VendorEKSDeployer to a Namespace-scoped group and bind it to admin (or a custom least-privilege Role) in theportalNamespace only. - Create ECR repos named
portal-apiandcron-runner(or anyportal-*prefix) and grant VendorRepoPusher push rights. - (Optional) Set up the two Lambda functions and EventBridge schedules for nightly stop/start.
- Share the role ARNs, ExternalId, cluster name, repo names, and DB instance id with us.
コメントを残す