Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Jan 19, 2026

Summary

  • Make tests retry automatically when hitting transient GCP errors (QUOTA_EXCEEDED, ZONE_RESOURCE_POOL_EXHAUSTED)
  • Only permanent errors mark tests as failed

Problem

When GCP quota is exceeded, tests were marked as failed with an error on GitHub:

GCP quota limit reached. Please wait for other tests to complete or contact the administrator.

This was confusing because:

  1. The message said "wait" but the test was already permanently failed
  2. Users saw a red X on their PR for infrastructure issues, not code problems
  3. Tests wouldn't automatically retry when resources became available

Solution

Transient GCP errors now leave the test pending instead of failed:

  • Test remains in queue without a GcpInstance record
  • Next cron run will attempt to start it again
  • When resources are available, the test runs normally

Permanent errors (like RESOURCE_NOT_FOUND) still immediately fail.

Changes

  • Added GCP_RETRYABLE_ERRORS set defining transient error codes
  • Added get_gcp_error_code() and is_retryable_gcp_error() helpers
  • Modified start_test() to skip mark_test_failed() for retryable errors
  • Updated error message for QUOTA_EXCEEDED to match actual behavior
  • Updated tests to verify new behavior

Test plan

  • Updated existing test to verify QUOTA_EXCEEDED doesn't mark test as failed
  • Added new test to verify RESOURCE_NOT_FOUND still marks test as failed
  • CI passes

🤖 Generated with Claude Code

QUOTA_EXCEEDED and ZONE_RESOURCE_POOL_EXHAUSTED are transient errors
that will resolve when other tests complete or resources become available.

Previously, these errors would mark the test as failed with an error
status on GitHub. This was confusing because:
1. The error message said "will be retried" but the test was already failed
2. Users saw a red X on their PR for infrastructure issues, not code problems

Now, transient GCP errors leave the test pending so it will be retried
on the next cron run. Only permanent errors (like RESOURCE_NOT_FOUND)
mark the test as failed.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@sonarqubecloud
Copy link

@canihavesomecoffee canihavesomecoffee merged commit 5d92d0f into master Jan 19, 2026
6 checks passed
@cfsmp3 cfsmp3 deleted the fix/retry-quota-exceeded branch January 19, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants