Skip to content

fix: clean up orphaned mapt VPCs without NAT gws#299

Open
psturc wants to merge 3 commits into
konflux-ci:mainfrom
psturc:fix-mapt-deletion-script
Open

fix: clean up orphaned mapt VPCs without NAT gws#299
psturc wants to merge 3 commits into
konflux-ci:mainfrom
psturc:fix-mapt-deletion-script

Conversation

@psturc

@psturc psturc commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

  • Add orphaned VPC cleanup for mapt clusters that don't create NAT gateways
    (kind clusters use NatGatewayModeNone — confirmed in mapt source code)
  • VPCs are discovered by origin=mapt tag, then filtered by EC2 instance
    presence: active instances → skip, no instances → safe to delete
  • Add LB deletion polling + skip ELB-managed ENIs to prevent async race
    conditions during cleanup

Problem

The cleanup script discovered VPCs exclusively via NAT gateways tagged
origin=mapt. Kind clusters never create NAT gateways, making them
invisible to the script. This caused ~287 orphaned VPCs to accumulate
(117 in us-west-2, 143 in us-east-1, 27 in us-east-2) along with
associated IGWs, subnets, security groups, route tables, and load balancers.

Test plan

  • Dry-run against us-west-2 — verified only orphaned VPCs targeted
  • Real run against us-west-2 — 117 VPCs cleaned, zero errors
  • Dry-run against us-east-1 — 136 orphaned VPCs correctly identified
  • Verified active clusters (with running instances) are skipped
  • Verified LB async deletion race condition is fixed

CI test

@psturc psturc force-pushed the fix-mapt-deletion-script branch from a3f6367 to 7bb6288 Compare June 12, 2026 14:00
@qodo-app-for-konflux-ci

qodo-app-for-konflux-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (6) 📘 Rule violations (0)

Grey Divider


Action required

1. EIP release may fail 🐞 Bug ☼ Reliability ⭐ New
Description
In the ELB-managed ENI branch, the script queries EIPs by NetworkInterfaceId (i.e., currently
associated) and then calls the generic EIP deleter which only runs aws ec2 release-address without
disassociating first, so EIP cleanup can fail and block later dependency deletion. The loop also
doesn’t guard against AWS CLI text output returning the literal string None, which can trigger an
invalid release-address --allocation-id None call.
Code

scripts/mapt/delete-mapt-clusters.sh[R556-560]

+              eips_data=$(aws ec2 describe-addresses --region "$region" --query "Addresses[?NetworkInterfaceId=='$eni_id'].AllocationId" --output text 2>/dev/null)
+              for eip_alloc_id in $eips_data; do
+                  delete_resource "EIP" "$eip_alloc_id" "$region" "ELB ENI EIP release"
+                  regional_eips=$((regional_eips + 1))
+              done
Relevance

⭐⭐ Medium

No historical suggestions found; prior script in PR235 used release-address without
disassociate/None-guard, suggesting not enforced.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The ELB-ENI branch gathers EIP allocation IDs based on NetworkInterfaceId and passes them to
delete_resource "EIP", but the EIP delete implementation only executes aws ec2 release-address
(no disassociation step). The script already acknowledges the AWS CLI None sentinel elsewhere, but
this new EIP loop doesn’t guard against it.

scripts/mapt/delete-mapt-clusters.sh[553-561]
scripts/mapt/delete-mapt-clusters.sh[119-121]
scripts/mapt/delete-mapt-clusters.sh[88-90]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The orphan-VPC cleanup’s ELB-ENI path attempts to release EIPs that are still associated to an ENI by calling `delete_resource "EIP"`, but `delete_resource` uses `aws ec2 release-address` only (no disassociation). This can fail and leave resources that later prevent IGW detach / VPC delete. The loop can also attempt to release a literal `None` allocation id.

### Issue Context
This happens only for ENIs with `InterfaceType` corresponding to ELB-managed ENIs (the branch that currently does `continue` after EIP handling).

### Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[556-560]
- scripts/mapt/delete-mapt-clusters.sh[101-131]

### Suggested fix approach
1. When looking up EIPs for an ENI, also retrieve `AssociationId` (or detect association via `describe-addresses`).
2. If an EIP is associated, disassociate it first (`aws ec2 disassociate-address --association-id ...`), then release by allocation id.
3. Treat `None`/empty results as “no EIPs” (skip the loop) so you don’t call `release-address` with `None`.
4. (Optional but safer) Only increment `regional_eips` after the disassociate/release succeeds (check exit codes).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. None breaks orphan checks 🐞 Bug ≡ Correctness
Description
In the orphan-VPC path, active_instances and terminated_data are treated as “present” based only
on -n, so an AWS CLI --output text result of the literal string None will incorrectly skip VPC
deletion or feed None into get_age_seconds() and trip the age guard. This can prevent orphaned
VPC cleanup from ever running for some VPCs, re-creating the leak this PR is intended to fix.
Code

scripts/mapt/delete-mapt-clusters.sh[R459-490]

+      # Check for any non-terminated EC2 instances in this VPC
+      active_instances=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" \
+            --query 'Reservations[].Instances[?State.Name!=`terminated`].InstanceId' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$active_instances" ]; then
+          continue
+      fi
+
+      # Check for terminated instances to apply the age limit
+      terminated_data=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" "Name=instance-state-name,Values=terminated" \
+            --query 'Reservations[].Instances[].LaunchTime' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$terminated_data" ]; then
+          skip_vpc=false
+          for launch_time in $terminated_data; do
+              age_seconds=$(get_age_seconds "$launch_time")
+              if [ "$age_seconds" -le "$AGE_LIMIT_SECONDS" ]; then
+                  skip_vpc=true
+                  break
+              fi
+          done
+          if $skip_vpc; then
+              continue
+          fi
Relevance

⭐⭐⭐ High

Repo already guards AWS CLI text output 'None' elsewhere; likely accept consistent None-handling
fix.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The script explicitly handles None as a possible AWS CLI --output text value elsewhere, but the
new orphan-VPC logic only checks -n and can therefore mis-handle None and skip cleanup or
miscompute age.

scripts/mapt/delete-mapt-clusters.sh[88-90]
scripts/mapt/delete-mapt-clusters.sh[459-490]
scripts/mapt/delete-mapt-clusters.sh[500-508]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The orphan-VPC cleanup treats any non-empty `--output text` result as real data. When AWS CLI returns the literal string `None`, the script either:
- skips deletion (`active_instances` check), or
- treats `None` as a timestamp, making `get_age_seconds` return `0` and causing the VPC to be skipped as “too new”.

### Issue Context
This script already anticipates `None` from AWS CLI text output in other locations, but the new orphan-VPC branch does not.

### Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[459-490]
- scripts/mapt/delete-mapt-clusters.sh[492-509]

### Proposed fix
- After each AWS CLI `--output text` capture used as a presence check, normalize `None` to empty (or explicitly check `!= "None"`).
 - Example:
   - `if [ -n "$active_instances" ] && [ "$active_instances" != "None" ]; then ...`
   - `if [ -n "$terminated_data" ] && [ "$terminated_data" != "None" ]; then ... else ...`
- Similarly, when iterating timestamps, skip non-timestamps (e.g., `None`) before calling `get_age_seconds`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Unscoped EIP deletion 🐞 Bug ≡ Correctness
Description
In the orphan-VPC cleanup path, standalone_eips is computed only by origin=mapt and
AssociationId==null, so any unassociated tagged EIP in the *region* can be released while
processing an unrelated orphan VPC. This can break other mapt clusters/workflows and will be
re-attempted once per orphan VPC processed.
Code

scripts/mapt/delete-mapt-clusters.sh[R549-555]

+      # --- 3.3 Standalone EIPs tagged with origin=mapt in this VPC ---
+      # (EIPs not attached to an ENI won't be caught above)
+      standalone_eips=$(aws ec2 describe-addresses --region "$region" --filters "$FILTER" --query "Addresses[?AssociationId==null].AllocationId" --output text 2>/dev/null)
+      for eip_alloc_id in $standalone_eips; do
+          delete_resource "EIP" "$eip_alloc_id" "$region" "Orphaned standalone EIP"
+          regional_eips=$((regional_eips + 1))
+      done
Relevance

⭐⭐ Medium

No prior evidence on scoping EIP release to VPC/project; PR #235 used broad tag-based EIP cleanup.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The script’s comment claims these are standalone EIPs “in this VPC”, but the actual query only
filters by the global origin tag and association null, which is region-wide and not scoped to the
current vpc_id or project_name. The README also describes the script as cleaning up
cluster-associated resources, implying cleanup should be scoped to the target cluster/VPC.

scripts/mapt/delete-mapt-clusters.sh[549-555]
scripts/mapt/README.md[3-6]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The orphan-VPC cleanup loop releases **all** unassociated EIPs tagged `origin=mapt` in the region, even if they do not belong to the orphan VPC currently being cleaned up.

## Issue Context
EIPs do not have a direct VPC linkage when unassociated, so a tag-only regional query can impact unrelated clusters.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[549-555]

## Suggested fix approach
- Remove the per-orphan-VPC regional EIP deletion loop, OR
- Restrict EIP cleanup to EIPs that can be safely attributed to the orphan VPC/project, e.g.:
 - Filter by `projectName==$project_name` (if EIPs are tagged with projectName), and/or
 - Only delete EIPs that were previously attached to ENIs in the orphan VPC (track those ENI->EIP allocation IDs before deletion), and/or
 - Run a single regional pass *after* computing a set of orphan project names, and delete only EIPs whose tags match those orphan projects.
- Ensure the EIP cleanup is not executed repeatedly per VPC (avoid duplicate delete attempts).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Orphan VPC bypasses age 🐞 Bug ≡ Correctness
Description
The orphan-VPC path marks a VPC for deletion whenever there are no active instances, but if the
instance API returns no terminated instances (instances purged or never created) it skips the
AGE_LIMIT_SECONDS guard entirely. This contradicts the documented “older than 1 day” safety limit
and can delete newly created/provisioning origin=mapt VPCs.
Code

scripts/mapt/delete-mapt-clusters.sh[R444-511]

+  orphan_vpcs=$(
+      aws ec2 describe-vpcs \
+        --region "$region" \
+        --filters "$FILTER" \
+        --query 'Vpcs[].VpcId' \
+        --output text 2>/dev/null
+  )
+
+  for vpc_id in $orphan_vpcs; do
+      # Skip VPCs already handled by the NAT gateway path
+      if echo "$counted_vpcs_string" | grep -q "|${vpc_id}|"; then
+          continue
+      fi
+
+      # Check for any non-terminated EC2 instances in this VPC
+      active_instances=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" \
+            --query 'Reservations[].Instances[?State.Name!=`terminated`].InstanceId' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$active_instances" ]; then
+          continue
+      fi
+
+      # Check for terminated instances to apply the age limit
+      terminated_data=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" "Name=instance-state-name,Values=terminated" \
+            --query 'Reservations[].Instances[].LaunchTime' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$terminated_data" ]; then
+          # Use the most recent LaunchTime among terminated instances
+          skip_vpc=false
+          for launch_time in $terminated_data; do
+              age_seconds=$(get_age_seconds "$launch_time")
+              if [ "$age_seconds" -le "$AGE_LIMIT_SECONDS" ]; then
+                  skip_vpc=true
+                  break
+              fi
+          done
+          if $skip_vpc; then
+              continue
+          fi
+      fi
+      # No instances at all (purged from API >1h after termination) or
+      # only terminated instances older than the age limit — VPC is orphaned.
+
+      project_name=$(
+          aws ec2 describe-vpcs \
+            --region "$region" \
+            --vpc-ids "$vpc_id" \
+            --query "Vpcs[0].Tags[?Key=='$PROJECT_TAG_KEY'].Value" \
+            --output text 2>/dev/null
+      )
+      project_name="${project_name:-unknown}"
+
+      echo "  --------------------------------------------------"
+      echo "  ✅ ORPHANED VPC FOUND (no NAT Gateway, no active instances)"
+      echo "     VPC ID: $vpc_id"
+      echo "     Project: $project_name"
+
+      regional_vpc_count=$((regional_vpc_count + 1))
+      regional_vpcs_to_delete="$regional_vpcs_to_delete $vpc_id"
+      counted_vpcs_string="${counted_vpcs_string}${vpc_id}|"
Relevance

⭐⭐ Medium

Only precedent is initial cleanup script PR #235; no history on enforcing age guard in orphan path.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The new orphan discovery applies the age limit only when terminated instances are returned; when
none are returned, it immediately proceeds to schedule the VPC for deletion. The README states the
script targets resources older than 1 day, which this path can violate.

scripts/mapt/delete-mapt-clusters.sh[458-511]
scripts/mapt/README.md[5-6]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Orphan VPC deletion can proceed without any age verification when `terminated_data` is empty, allowing deletion of recently created/provisioning VPCs tagged `origin=mapt`.

## Issue Context
AWS VPC APIs do not expose VPC creation time directly. The current logic only applies age checks when terminated instances are visible.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[444-511]
- scripts/mapt/README.md[5-6]

## Suggested fix approach
Implement a reliable age signal for VPCs with no EC2 history, and *only* proceed when that signal is older than `AGE_LIMIT_SECONDS`. Options:
- Preferably look up the VPC creation event time via CloudTrail (if available in this environment) and compute age from that.
- If CloudTrail is not available, use a conservative fallback: **skip deletion** when no age signal is available (log a warning so it can be investigated).
- Keep the existing terminated-instance-based check as an additional guard.

This preserves the documented “older than 1 day” behavior while still allowing safe cleanup when an age source exists.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. Regex matches wrong projects 🐞 Bug ≡ Correctness
Description
Standalone EIP deletion checks membership via grep -q "|${eip_project}|", which is regex-based and
uses unescaped projectName tag values. If a project name contains regex metacharacters, the match
can succeed for the wrong project and release unrelated EIPs.
Code

scripts/mapt/delete-mapt-clusters.sh[R650-653]

+          eip_project=$(echo "$standalone_eips" | jq -r ".[$i].Tags[]? | select(.Key==\"$PROJECT_TAG_KEY\") | .Value // \"unknown\"")
+          eip_project="${eip_project:-unknown}"
+          if echo "$orphan_project_names" | grep -q "|${eip_project}|"; then
+              delete_resource "EIP" "$eip_alloc_id" "$region" "Orphaned standalone EIP (project: $eip_project)"
Relevance

⭐⭐ Medium

No prior reviews on escaping grep regex; existing script already uses grep -q with unescaped vars.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The code performs a regex grep match using an unescaped tag-derived string (eip_project) to
decide whether to delete an EIP, which can lead to false-positive matches and unintended deletions.

scripts/mapt/delete-mapt-clusters.sh[645-655]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The standalone EIP cleanup uses `grep` regex matching with a pattern built from `projectName` tag values. Because the value is not escaped and `grep` defaults to regex mode, project names containing regex metacharacters can match other entries, causing incorrect EIP deletions.

### Issue Context
`eip_project` is derived from tags (data), but is used as a regex (code).

### Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[645-655]

### Proposed fix
- Switch to fixed-string matching:
 - `grep -Fq "|${eip_project}|"`
- Optionally harden further by avoiding the pipe-delimited string approach (e.g., build a set/map of orphan project names and do exact equality checks).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. LB timeout ignored 🐞 Bug ☼ Reliability
Description
After requesting load balancer deletion, the orphan cleanup continues even when
wait_for_resource_deletion times out, which can cause subsequent dependency deletions to fail due
to resources still being in-use. This can leave a VPC partially deleted and require manual cleanup.
Code

scripts/mapt/delete-mapt-clusters.sh[R524-528]

+      for lb_arn in $lbs_arns_in_vpc; do
+          delete_resource "LB" "$lb_arn" "$region" "Orphaned VPC cleanup"
+          wait_for_resource_deletion "LB" "$lb_arn" "$region"
+          regional_lbs=$((regional_lbs + 1))
+      done
Relevance

⭐⭐ Medium

No prior repo evidence about halting cleanup on deletion-wait timeout; PR #235 explicitly continued
after timeouts.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The wait function explicitly returns 1 on timeout, but the caller does not branch on that result
and continues cleanup unconditionally.

scripts/mapt/delete-mapt-clusters.sh[56-98]
scripts/mapt/delete-mapt-clusters.sh[524-528]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The script does not act on `wait_for_resource_deletion` failure for LBs, so it may proceed with dependency deletions while the LB (and its ENIs) still exist.

## Issue Context
`wait_for_resource_deletion` returns non-zero on timeout, but the LB caller ignores it.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[56-98]
- scripts/mapt/delete-mapt-clusters.sh[524-528]

## Suggested fix approach
- Check the return code of `wait_for_resource_deletion "LB" ...`.
- On timeout/failure, either:
 - `continue` to the next VPC (skip the rest of cleanup for this VPC to avoid noisy failures / partial teardown), or
 - increase the timeout and/or add an additional retry/backoff, and only proceed when deletion is confirmed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit 5e573c6

Results up to commit 7bb6288


🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0)


Action required
1. Unscoped EIP deletion 🐞 Bug ≡ Correctness
Description
In the orphan-VPC cleanup path, standalone_eips is computed only by origin=mapt and
AssociationId==null, so any unassociated tagged EIP in the *region* can be released while
processing an unrelated orphan VPC. This can break other mapt clusters/workflows and will be
re-attempted once per orphan VPC processed.
Code

scripts/mapt/delete-mapt-clusters.sh[R549-555]

+      # --- 3.3 Standalone EIPs tagged with origin=mapt in this VPC ---
+      # (EIPs not attached to an ENI won't be caught above)
+      standalone_eips=$(aws ec2 describe-addresses --region "$region" --filters "$FILTER" --query "Addresses[?AssociationId==null].AllocationId" --output text 2>/dev/null)
+      for eip_alloc_id in $standalone_eips; do
+          delete_resource "EIP" "$eip_alloc_id" "$region" "Orphaned standalone EIP"
+          regional_eips=$((regional_eips + 1))
+      done
Relevance

⭐⭐ Medium

No prior evidence on scoping EIP release to VPC/project; PR #235 used broad tag-based EIP cleanup.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The script’s comment claims these are standalone EIPs “in this VPC”, but the actual query only
filters by the global origin tag and association null, which is region-wide and not scoped to the
current vpc_id or project_name. The README also describes the script as cleaning up
cluster-associated resources, implying cleanup should be scoped to the target cluster/VPC.

scripts/mapt/delete-mapt-clusters.sh[549-555]
scripts/mapt/README.md[3-6]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The orphan-VPC cleanup loop releases **all** unassociated EIPs tagged `origin=mapt` in the region, even if they do not belong to the orphan VPC currently being cleaned up.

## Issue Context
EIPs do not have a direct VPC linkage when unassociated, so a tag-only regional query can impact unrelated clusters.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[549-555]

## Suggested fix approach
- Remove the per-orphan-VPC regional EIP deletion loop, OR
- Restrict EIP cleanup to EIPs that can be safely attributed to the orphan VPC/project, e.g.:
 - Filter by `projectName==$project_name` (if EIPs are tagged with projectName), and/or
 - Only delete EIPs that were previously attached to ENIs in the orphan VPC (track those ENI->EIP allocation IDs before deletion), and/or
 - Run a single regional pass *after* computing a set of orphan project names, and delete only EIPs whose tags match those orphan projects.
- Ensure the EIP cleanup is not executed repeatedly per VPC (avoid duplicate delete attempts).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Orphan VPC bypasses age 🐞 Bug ≡ Correctness
Description
The orphan-VPC path marks a VPC for deletion whenever there are no active instances, but if the
instance API returns no terminated instances (instances purged or never created) it skips the
AGE_LIMIT_SECONDS guard entirely. This contradicts the documented “older than 1 day” safety limit
and can delete newly created/provisioning origin=mapt VPCs.
Code

scripts/mapt/delete-mapt-clusters.sh[R444-511]

+  orphan_vpcs=$(
+      aws ec2 describe-vpcs \
+        --region "$region" \
+        --filters "$FILTER" \
+        --query 'Vpcs[].VpcId' \
+        --output text 2>/dev/null
+  )
+
+  for vpc_id in $orphan_vpcs; do
+      # Skip VPCs already handled by the NAT gateway path
+      if echo "$counted_vpcs_string" | grep -q "|${vpc_id}|"; then
+          continue
+      fi
+
+      # Check for any non-terminated EC2 instances in this VPC
+      active_instances=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" \
+            --query 'Reservations[].Instances[?State.Name!=`terminated`].InstanceId' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$active_instances" ]; then
+          continue
+      fi
+
+      # Check for terminated instances to apply the age limit
+      terminated_data=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" "Name=instance-state-name,Values=terminated" \
+            --query 'Reservations[].Instances[].LaunchTime' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$terminated_data" ]; then
+          # Use the most recent LaunchTime among terminated instances
+          skip_vpc=false
+          for launch_time in $terminated_data; do
+              age_seconds=$(get_age_seconds "$launch_time")
+              if [ "$age_seconds" -le "$AGE_LIMIT_SECONDS" ]; then
+                  skip_vpc=true
+                  break
+              fi
+          done
+          if $skip_vpc; then
+              continue
+          fi
+      fi
+      # No instances at all (purged from API >1h after termination) or
+      # only terminated instances older than the age limit — VPC is orphaned.
+
+      project_name=$(
+          aws ec2 describe-vpcs \
+            --region "$region" \
+            --vpc-ids "$vpc_id" \
+            --query "Vpcs[0].Tags[?Key=='$PROJECT_TAG_KEY'].Value" \
+            --output text 2>/dev/null
+      )
+      project_name="${project_name:-unknown}"
+
+      echo "  --------------------------------------------------"
+      echo "  ✅ ORPHANED VPC FOUND (no NAT Gateway, no active instances)"
+      echo "     VPC ID: $vpc_id"
+      echo "     Project: $project_name"
+
+      regional_vpc_count=$((regional_vpc_count + 1))
+      regional_vpcs_to_delete="$regional_vpcs_to_delete $vpc_id"
+      counted_vpcs_string="${counted_vpcs_string}${vpc_id}|"
Relevance

⭐⭐ Medium

Only precedent is initial cleanup script PR #235; no history on enforcing age guard in orphan path.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The new orphan discovery applies the age limit only when terminated instances are returned; when
none are returned, it immediately proceeds to schedule the VPC for deletion. The README states the
script targets resources older than 1 day, which this path can violate.

scripts/mapt/delete-mapt-clusters.sh[458-511]
scripts/mapt/README.md[5-6]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Orphan VPC deletion can proceed without any age verification when `terminated_data` is empty, allowing deletion of recently created/provisioning VPCs tagged `origin=mapt`.

## Issue Context
AWS VPC APIs do not expose VPC creation time directly. The current logic only applies age checks when terminated instances are visible.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[444-511]
- scripts/mapt/README.md[5-6]

## Suggested fix approach
Implement a reliable age signal for VPCs with no EC2 history, and *only* proceed when that signal is older than `AGE_LIMIT_SECONDS`. Options:
- Preferably look up the VPC creation event time via CloudTrail (if available in this environment) and compute age from that.
- If CloudTrail is not available, use a conservative fallback: **skip deletion** when no age signal is available (log a warning so it can be investigated).
- Keep the existing terminated-instance-based check as an additional guard.

This preserves the documented “older than 1 day” behavior while still allowing safe cleanup when an age source exists.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
3. LB timeout ignored 🐞 Bug ☼ Reliability
Description
After requesting load balancer deletion, the orphan cleanup continues even when
wait_for_resource_deletion times out, which can cause subsequent dependency deletions to fail due
to resources still being in-use. This can leave a VPC partially deleted and require manual cleanup.
Code

scripts/mapt/delete-mapt-clusters.sh[R524-528]

+      for lb_arn in $lbs_arns_in_vpc; do
+          delete_resource "LB" "$lb_arn" "$region" "Orphaned VPC cleanup"
+          wait_for_resource_deletion "LB" "$lb_arn" "$region"
+          regional_lbs=$((regional_lbs + 1))
+      done
Relevance

⭐⭐ Medium

No prior repo evidence about halting cleanup on deletion-wait timeout; PR #235 explicitly continued
after timeouts.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The wait function explicitly returns 1 on timeout, but the caller does not branch on that result
and continues cleanup unconditionally.

scripts/mapt/delete-mapt-clusters.sh[56-98]
scripts/mapt/delete-mapt-clusters.sh[524-528]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The script does not act on `wait_for_resource_deletion` failure for LBs, so it may proceed with dependency deletions while the LB (and its ENIs) still exist.

## Issue Context
`wait_for_resource_deletion` returns non-zero on timeout, but the LB caller ignores it.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[56-98]
- scripts/mapt/delete-mapt-clusters.sh[524-528]

## Suggested fix approach
- Check the return code of `wait_for_resource_deletion "LB" ...`.
- On timeout/failure, either:
 - `continue` to the next VPC (skip the rest of cleanup for this VPC to avoid noisy failures / partial teardown), or
 - increase the timeout and/or add an additional retry/backoff, and only proceed when deletion is confirmed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Results up to commit b715681


🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0)


Action required
1. None breaks orphan checks 🐞 Bug ≡ Correctness
Description
In the orphan-VPC path, active_instances and terminated_data are treated as “present” based only
on -n, so an AWS CLI --output text result of the literal string None will incorrectly skip VPC
deletion or feed None into get_age_seconds() and trip the age guard. This can prevent orphaned
VPC cleanup from ever running for some VPCs, re-creating the leak this PR is intended to fix.
Code

scripts/mapt/delete-mapt-clusters.sh[R459-490]

+      # Check for any non-terminated EC2 instances in this VPC
+      active_instances=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" \
+            --query 'Reservations[].Instances[?State.Name!=`terminated`].InstanceId' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$active_instances" ]; then
+          continue
+      fi
+
+      # Check for terminated instances to apply the age limit
+      terminated_data=$(
+          aws ec2 describe-instances \
+            --region "$region" \
+            --filters "Name=vpc-id,Values=$vpc_id" "Name=instance-state-name,Values=terminated" \
+            --query 'Reservations[].Instances[].LaunchTime' \
+            --output text 2>/dev/null
+      )
+      if [ -n "$terminated_data" ]; then
+          skip_vpc=false
+          for launch_time in $terminated_data; do
+              age_seconds=$(get_age_seconds "$launch_time")
+              if [ "$age_seconds" -le "$AGE_LIMIT_SECONDS" ]; then
+                  skip_vpc=true
+                  break
+              fi
+          done
+          if $skip_vpc; then
+              continue
+          fi
Relevance

⭐⭐⭐ High

Repo already guards AWS CLI text output 'None' elsewhere; likely accept consistent None-handling
fix.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The script explicitly handles None as a possible AWS CLI --output text value elsewhere, but the
new orphan-VPC logic only checks -n and can therefore mis-handle None and skip cleanup or
miscompute age.

scripts/mapt/delete-mapt-clusters.sh[88-90]
scripts/mapt/delete-mapt-clusters.sh[459-490]
scripts/mapt/delete-mapt-clusters.sh[500-508]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The orphan-VPC cleanup treats any non-empty `--output text` result as real data. When AWS CLI returns the literal string `None`, the script either:
- skips deletion (`active_instances` check), or
- treats `None` as a timestamp, making `get_age_seconds` return `0` and causing the VPC to be skipped as “too new”.

### Issue Context
This script already anticipates `None` from AWS CLI text output in other locations, but the new orphan-VPC branch does not.

### Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[459-490]
- scripts/mapt/delete-mapt-clusters.sh[492-509]

### Proposed fix
- After each AWS CLI `--output text` capture used as a presence check, normalize `None` to empty (or explicitly check `!= "None"`).
 - Example:
   - `if [ -n "$active_instances" ] && [ "$active_instances" != "None" ]; then ...`
   - `if [ -n "$terminated_data" ] && [ "$terminated_data" != "None" ]; then ... else ...`
- Similarly, when iterating timestamps, skip non-timestamps (e.g., `None`) before calling `get_age_seconds`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
2. Regex matches wrong projects 🐞 Bug ≡ Correctness
Description
Standalone EIP deletion checks membership via grep -q "|${eip_project}|", which is regex-based and
uses unescaped projectName tag values. If a project name contains regex metacharacters, the match
can succeed for the wrong project and release unrelated EIPs.
Code

scripts/mapt/delete-mapt-clusters.sh[R650-653]

+          eip_project=$(echo "$standalone_eips" | jq -r ".[$i].Tags[]? | select(.Key==\"$PROJECT_TAG_KEY\") | .Value // \"unknown\"")
+          eip_project="${eip_project:-unknown}"
+          if echo "$orphan_project_names" | grep -q "|${eip_project}|"; then
+              delete_resource "EIP" "$eip_alloc_id" "$region" "Orphaned standalone EIP (project: $eip_project)"
Relevance

⭐⭐ Medium

No prior reviews on escaping grep regex; existing script already uses grep -q with unescaped vars.

PR-#235

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The code performs a regex grep match using an unescaped tag-derived string (eip_project) to
decide whether to delete an EIP, which can lead to false-positive matches and unintended deletions.

scripts/mapt/delete-mapt-clusters.sh[645-655]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The standalone EIP cleanup uses `grep` regex matching with a pattern built from `projectName` tag values. Because the value is not escaped and `grep` defaults to regex mode, project names containing regex metacharacters can match other entries, causing incorrect EIP deletions.

### Issue Context
`eip_project` is derived from tags (data), but is used as a regex (code).

### Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[645-655]

### Proposed fix
- Switch to fixed-string matching:
 - `grep -Fq "|${eip_project}|"`
- Optionally harden further by avoiding the pipe-delimited string approach (e.g., build a set/map of orphan project names and do exact equality checks).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

@qodo-app-for-konflux-ci

Copy link
Copy Markdown

PR Summary by Qodo

Fix mapt cleanup: delete orphaned VPCs without NAT gateways + LB/ENI race handling
🐞 Bug fix ✨ Enhancement 🕐 40+ Minutes

Grey Divider

Walkthroughs

Description
• Add a VPC discovery pass for mapt clusters that run without NAT gateways
  (kind/NatGatewayModeNone).
• Only delete tagged VPCs with no active EC2 instances and past the age threshold.
• Poll LB deletion and skip ELB-managed ENIs to avoid async cleanup races.
Diagram
graph TD
  A["delete-mapt-clusters.sh"] --> B["Discover VPCs via NAT GW tags"] --> F["Cleanup dependencies"] --> G["Final VPC delete"]
  A --> C["Discover VPCs via VPC tag (origin=mapt)"] --> D["Check EC2 instances (active/terminated+age)"] --> F
  F --> E["Delete & poll Load Balancers"]
  F --> H["Delete ENIs/EIPs (skip ELB ENIs)"]
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Use Resource Groups Tagging API for discovery
  • ➕ Single API to find all tagged resources (VPCs, ENIs, EIPs, LBs) without service-by-service queries
  • ➕ Reduces risk of missing a dependency type as AWS adds new resources
  • ➖ Still requires ordered deletion logic and service-specific delete semantics
  • ➖ Additional permissions and more complex paging/filters
2. Move to TTL-based lifecycle management (tag + scheduled janitor)
  • ➕ Prevents orphans proactively instead of relying on periodic heavy cleanup
  • ➕ Clear ownership/expiry semantics (e.g., expires_at tag)
  • ➖ Requires changes in cluster provisioning to set TTL tags consistently
  • ➖ Doesn’t immediately address existing orphaned inventory without a one-time migration run
3. Rely on mapt-managed teardown (make deletion idempotent)
  • ➕ Best source of truth for what was created; less guesswork in cleanup script
  • ➕ Can ensure correct dependency ordering and retries are handled centrally
  • ➖ May require non-trivial changes in mapt and rollout coordination
  • ➖ Doesn’t help when teardown is never invoked (crashes/timeouts) unless paired with janitor

Recommendation: Current approach is appropriate as a pragmatic remediation: it keeps the existing NAT-based path, adds a tag-based backstop for kind clusters, and introduces LB polling/ELB-ENI exclusions to reduce race-related failures. Longer-term, consider adding TTL tags at provision time to reduce reliance on inference (instance presence/age) and simplify discovery.

Grey Divider

File Changes

Bug fix (1)
delete-mapt-clusters.sh Add tag-based orphan VPC discovery and safer LB/ENI cleanup ordering +235/-42

Add tag-based orphan VPC discovery and safer LB/ENI cleanup ordering

• Adds a second discovery path that finds origin=mapt VPCs directly (covering clusters without NAT gateways) and only targets those with no active instances and past the age limit. Extends deletion polling to support ALB/NLB deletion and updates cleanup to wait for LB removal and skip ELB-managed ENIs, reducing async race conditions during teardown.

scripts/mapt/delete-mapt-clusters.sh


Grey Divider

Qodo Logo

Comment thread scripts/mapt/delete-mapt-clusters.sh Outdated
Comment on lines +549 to +555
# --- 3.3 Standalone EIPs tagged with origin=mapt in this VPC ---
# (EIPs not attached to an ENI won't be caught above)
standalone_eips=$(aws ec2 describe-addresses --region "$region" --filters "$FILTER" --query "Addresses[?AssociationId==null].AllocationId" --output text 2>/dev/null)
for eip_alloc_id in $standalone_eips; do
delete_resource "EIP" "$eip_alloc_id" "$region" "Orphaned standalone EIP"
regional_eips=$((regional_eips + 1))
done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Unscoped eip deletion 🐞 Bug ≡ Correctness

In the orphan-VPC cleanup path, standalone_eips is computed only by origin=mapt and
AssociationId==null, so any unassociated tagged EIP in the *region* can be released while
processing an unrelated orphan VPC. This can break other mapt clusters/workflows and will be
re-attempted once per orphan VPC processed.
Agent Prompt
## Issue description
The orphan-VPC cleanup loop releases **all** unassociated EIPs tagged `origin=mapt` in the region, even if they do not belong to the orphan VPC currently being cleaned up.

## Issue Context
EIPs do not have a direct VPC linkage when unassociated, so a tag-only regional query can impact unrelated clusters.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[549-555]

## Suggested fix approach
- Remove the per-orphan-VPC regional EIP deletion loop, OR
- Restrict EIP cleanup to EIPs that can be safely attributed to the orphan VPC/project, e.g.:
  - Filter by `projectName==$project_name` (if EIPs are tagged with projectName), and/or
  - Only delete EIPs that were previously attached to ENIs in the orphan VPC (track those ENI->EIP allocation IDs before deletion), and/or
  - Run a single regional pass *after* computing a set of orphan project names, and delete only EIPs whose tags match those orphan projects.
- Ensure the EIP cleanup is not executed repeatedly per VPC (avoid duplicate delete attempts).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +444 to +511
orphan_vpcs=$(
aws ec2 describe-vpcs \
--region "$region" \
--filters "$FILTER" \
--query 'Vpcs[].VpcId' \
--output text 2>/dev/null
)

for vpc_id in $orphan_vpcs; do
# Skip VPCs already handled by the NAT gateway path
if echo "$counted_vpcs_string" | grep -q "|${vpc_id}|"; then
continue
fi

# Check for any non-terminated EC2 instances in this VPC
active_instances=$(
aws ec2 describe-instances \
--region "$region" \
--filters "Name=vpc-id,Values=$vpc_id" \
--query 'Reservations[].Instances[?State.Name!=`terminated`].InstanceId' \
--output text 2>/dev/null
)
if [ -n "$active_instances" ]; then
continue
fi

# Check for terminated instances to apply the age limit
terminated_data=$(
aws ec2 describe-instances \
--region "$region" \
--filters "Name=vpc-id,Values=$vpc_id" "Name=instance-state-name,Values=terminated" \
--query 'Reservations[].Instances[].LaunchTime' \
--output text 2>/dev/null
)
if [ -n "$terminated_data" ]; then
# Use the most recent LaunchTime among terminated instances
skip_vpc=false
for launch_time in $terminated_data; do
age_seconds=$(get_age_seconds "$launch_time")
if [ "$age_seconds" -le "$AGE_LIMIT_SECONDS" ]; then
skip_vpc=true
break
fi
done
if $skip_vpc; then
continue
fi
fi
# No instances at all (purged from API >1h after termination) or
# only terminated instances older than the age limit — VPC is orphaned.

project_name=$(
aws ec2 describe-vpcs \
--region "$region" \
--vpc-ids "$vpc_id" \
--query "Vpcs[0].Tags[?Key=='$PROJECT_TAG_KEY'].Value" \
--output text 2>/dev/null
)
project_name="${project_name:-unknown}"

echo " --------------------------------------------------"
echo " ✅ ORPHANED VPC FOUND (no NAT Gateway, no active instances)"
echo " VPC ID: $vpc_id"
echo " Project: $project_name"

regional_vpc_count=$((regional_vpc_count + 1))
regional_vpcs_to_delete="$regional_vpcs_to_delete $vpc_id"
counted_vpcs_string="${counted_vpcs_string}${vpc_id}|"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Orphan vpc bypasses age 🐞 Bug ≡ Correctness

The orphan-VPC path marks a VPC for deletion whenever there are no active instances, but if the
instance API returns no terminated instances (instances purged or never created) it skips the
AGE_LIMIT_SECONDS guard entirely. This contradicts the documented “older than 1 day” safety limit
and can delete newly created/provisioning origin=mapt VPCs.
Agent Prompt
## Issue description
Orphan VPC deletion can proceed without any age verification when `terminated_data` is empty, allowing deletion of recently created/provisioning VPCs tagged `origin=mapt`.

## Issue Context
AWS VPC APIs do not expose VPC creation time directly. The current logic only applies age checks when terminated instances are visible.

## Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[444-511]
- scripts/mapt/README.md[5-6]

## Suggested fix approach
Implement a reliable age signal for VPCs with no EC2 history, and *only* proceed when that signal is older than `AGE_LIMIT_SECONDS`. Options:
- Preferably look up the VPC creation event time via CloudTrail (if available in this environment) and compute age from that.
- If CloudTrail is not available, use a conservative fallback: **skip deletion** when no age signal is available (log a warning so it can be investigated).
- Keep the existing terminated-instance-based check as an additional guard.

This preserves the documented “older than 1 day” behavior while still allowing safe cleanup when an age source exists.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

psturc added 2 commits June 12, 2026 16:55
mapt kind clusters use NatGatewayModeNone, so the existing NAT-gateway-based
VPC discovery never finds them. This left hundreds of orphaned VPCs with
their networking resources (IGWs, subnets, SGs, route tables) accumulating
across us-west-2, us-east-1, and us-east-2.

Add a second discovery pass that finds origin=mapt VPCs directly,
skips any with active EC2 instances, and cleans up dependencies.
Also add LB deletion polling to avoid async race conditions with
ELB-managed ENIs/EIPs.

Assisted-by: Cursor
Standalone EIPs were deleted region-wide inside the per-VPC loop,
potentially affecting unrelated mapt clusters. Move to a single
post-loop pass filtered by orphan projectName tags.

Add CloudTrail-based age check for VPCs with no EC2 instance history
to prevent deleting recently provisioned VPCs. Missing CloudTrail
events (>90d retention) are treated as old and safe to delete.

Assisted-by: Cursor
@psturc psturc force-pushed the fix-mapt-deletion-script branch from 7bb6288 to b715681 Compare June 12, 2026 14:55
@qodo-app-for-konflux-ci

qodo-app-for-konflux-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit b715681

Comment on lines +459 to +490
# Check for any non-terminated EC2 instances in this VPC
active_instances=$(
aws ec2 describe-instances \
--region "$region" \
--filters "Name=vpc-id,Values=$vpc_id" \
--query 'Reservations[].Instances[?State.Name!=`terminated`].InstanceId' \
--output text 2>/dev/null
)
if [ -n "$active_instances" ]; then
continue
fi

# Check for terminated instances to apply the age limit
terminated_data=$(
aws ec2 describe-instances \
--region "$region" \
--filters "Name=vpc-id,Values=$vpc_id" "Name=instance-state-name,Values=terminated" \
--query 'Reservations[].Instances[].LaunchTime' \
--output text 2>/dev/null
)
if [ -n "$terminated_data" ]; then
skip_vpc=false
for launch_time in $terminated_data; do
age_seconds=$(get_age_seconds "$launch_time")
if [ "$age_seconds" -le "$AGE_LIMIT_SECONDS" ]; then
skip_vpc=true
break
fi
done
if $skip_vpc; then
continue
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. None breaks orphan checks 🐞 Bug ≡ Correctness

In the orphan-VPC path, active_instances and terminated_data are treated as “present” based only
on -n, so an AWS CLI --output text result of the literal string None will incorrectly skip VPC
deletion or feed None into get_age_seconds() and trip the age guard. This can prevent orphaned
VPC cleanup from ever running for some VPCs, re-creating the leak this PR is intended to fix.
Agent Prompt
### Issue description
The orphan-VPC cleanup treats any non-empty `--output text` result as real data. When AWS CLI returns the literal string `None`, the script either:
- skips deletion (`active_instances` check), or
- treats `None` as a timestamp, making `get_age_seconds` return `0` and causing the VPC to be skipped as “too new”.

### Issue Context
This script already anticipates `None` from AWS CLI text output in other locations, but the new orphan-VPC branch does not.

### Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[459-490]
- scripts/mapt/delete-mapt-clusters.sh[492-509]

### Proposed fix
- After each AWS CLI `--output text` capture used as a presence check, normalize `None` to empty (or explicitly check `!= "None"`).
  - Example:
    - `if [ -n "$active_instances" ] && [ "$active_instances" != "None" ]; then ...`
    - `if [ -n "$terminated_data" ] && [ "$terminated_data" != "None" ]; then ... else ...`
- Similarly, when iterating timestamps, skip non-timestamps (e.g., `None`) before calling `get_age_seconds`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

NLB-managed ENIs were fully skipped during orphan VPC cleanup,
leaving their EIPs allocated. This blocked IGW detachment with
DependencyViolation ("mapped public addresses"), preventing VPC
deletion. Release EIPs from ELB ENIs before skipping the ENI.

Assisted-by: Cursor
@qodo-app-for-konflux-ci

qodo-app-for-konflux-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 5e573c6

Comment on lines +556 to +560
eips_data=$(aws ec2 describe-addresses --region "$region" --query "Addresses[?NetworkInterfaceId=='$eni_id'].AllocationId" --output text 2>/dev/null)
for eip_alloc_id in $eips_data; do
delete_resource "EIP" "$eip_alloc_id" "$region" "ELB ENI EIP release"
regional_eips=$((regional_eips + 1))
done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Eip release may fail 🐞 Bug ☼ Reliability

In the ELB-managed ENI branch, the script queries EIPs by NetworkInterfaceId (i.e., currently
associated) and then calls the generic EIP deleter which only runs aws ec2 release-address without
disassociating first, so EIP cleanup can fail and block later dependency deletion. The loop also
doesn’t guard against AWS CLI text output returning the literal string None, which can trigger an
invalid release-address --allocation-id None call.
Agent Prompt
### Issue description
The orphan-VPC cleanup’s ELB-ENI path attempts to release EIPs that are still associated to an ENI by calling `delete_resource "EIP"`, but `delete_resource` uses `aws ec2 release-address` only (no disassociation). This can fail and leave resources that later prevent IGW detach / VPC delete. The loop can also attempt to release a literal `None` allocation id.

### Issue Context
This happens only for ENIs with `InterfaceType` corresponding to ELB-managed ENIs (the branch that currently does `continue` after EIP handling).

### Fix Focus Areas
- scripts/mapt/delete-mapt-clusters.sh[556-560]
- scripts/mapt/delete-mapt-clusters.sh[101-131]

### Suggested fix approach
1. When looking up EIPs for an ENI, also retrieve `AssociationId` (or detect association via `describe-addresses`).
2. If an EIP is associated, disassociate it first (`aws ec2 disassociate-address --association-id ...`), then release by allocation id.
3. Treat `None`/empty results as “no EIPs” (skip the loop) so you don’t call `release-address` with `None`.
4. (Optional but safer) Only increment `regional_eips` after the disassociate/release succeeds (check exit codes).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant