Skip to content

Conversation

@jhjaggars
Copy link
Contributor

Summary

This PR fixes two sources of flakiness in AKS e2e tests identified from failures in PR #6975:

Changes

1. Add network-node-identity to podCrashTolerations

File: test/e2e/util/util.go

The network-node-identity webhook container can restart once during cluster initialization due to startup timing issues. This change adds it to the crash tolerations map with a tolerance of 1 restart.

Previous behavior: Test fails immediately on any container restart
New behavior: Test tolerates 1 restart for network-node-identity

2. Fix TestOnCreateAPIUX race condition

File: test/e2e/create_cluster_test.go

The test was using a hardcoded name "base" for all HostedCluster resources in sequential test iterations. When the delete from iteration N didn't complete before iteration N+1 started, the create would fail with "already exists" error.

Previous behavior: All test iterations use hardcoded name "base", causing race conditions
New behavior: Each test iteration generates a unique name using timestamp and index

Test Plan

  • make staticcheck passes
  • make verify passes
  • AKS e2e tests pass in CI

Notes

These are pre-existing flakiness issues unrelated to the scale-from-zero functionality in PR #6975. The failures were environmental/test infrastructure issues rather than product bugs.

Fix two sources of test flakiness in AKS e2e tests:

1. Add network-node-identity to podCrashTolerations map to allow
   one restart for webhook startup timing issues

2. Generate unique names for HostedCluster resources in
   TestOnCreateAPIUX to prevent "already exists" race condition
   from sequential test iterations

Signed-off-by: Jesse Jaggars <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

Walkthrough

Two test utility files were modified: unique HostedCluster names are generated by appending loop index and nanosecond timestamp to avoid race conditions during cluster creation tests, and a new pod crash toleration entry for network-node-identity webhook startup timing was added to the crash tolerations map.

Changes

Cohort / File(s) Summary
Test cluster creation naming
test/e2e/create_cluster_test.go
Introduces loop index variable for tc.validations loop to generate unique HostedCluster.Name by appending loop index and nanosecond timestamp, preventing race conditions.
Pod crash tolerations configuration
test/e2e/util/util.go
Adds "network-node-identity": 1 entry to podCrashTolerations map with comment explaining one restart allowance for webhook startup timing.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

  • Verify index variable is correctly used in the loop for generating unique names
  • Confirm nanosecond timestamp generation prevents naming collisions
  • Check that the new pod crash toleration entry (network-node-identity) is correctly positioned in the map
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c33b48b and aea607e.

📒 Files selected for processing (2)
  • test/e2e/create_cluster_test.go (1 hunks)
  • test/e2e/util/util.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/util/util.go
  • test/e2e/create_cluster_test.go
🧬 Code graph analysis (1)
test/e2e/create_cluster_test.go (3)
support/assets/readasset.go (1)
  • ShouldHostedCluster (81-85)
karpenter-operator/controllers/karpenter/assets/assets.go (1)
  • ReadFile (23-25)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go (1)
  • Now (77-77)
🔇 Additional comments (2)
test/e2e/util/util.go (1)

127-128: New crash toleration for network-node-identity is consistent and scoped

Key name matches other usages in this file, and allowing a single restart integrates cleanly with EnsureNoCrashingPods without broadening tolerances for other components.

test/e2e/create_cluster_test.go (1)

1693-1699: Unique HostedCluster names correctly eliminate create-time name collisions

Using the loop index plus a nanosecond timestamp produces short, DNS-compliant, unique names per validation run and addresses the “already exists” race without affecting the rest of the admission-testing logic.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added do-not-merge/needs-area area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Dec 18, 2025
@jhjaggars
Copy link
Contributor Author

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

@jhjaggars: The following commands are available to trigger required jobs:

/test e2e-aks
/test e2e-aks-4-21
/test e2e-aks-override
/test e2e-aws
/test e2e-aws-4-21
/test e2e-aws-override
/test e2e-aws-upgrade-hypershift-operator
/test e2e-kubevirt-aws-ovn-reduced
/test images
/test okd-scos-images
/test security
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-aws-autonode
/test e2e-aws-metrics
/test e2e-aws-minimal
/test e2e-aws-techpreview
/test e2e-azure-aks-ovn-conformance
/test e2e-conformance
/test e2e-kubevirt-aws-ovn
/test e2e-kubevirt-azure-ovn
/test e2e-kubevirt-metal-conformance
/test e2e-openstack-aws
/test e2e-openstack-aws-conformance
/test e2e-openstack-aws-csi-cinder
/test e2e-openstack-aws-csi-manila
/test e2e-openstack-aws-nfv
/test okd-scos-e2e-aws-ovn
/test reqserving-e2e-aws

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-hypershift-main-e2e-aks
pull-ci-openshift-hypershift-main-e2e-aks-4-21
pull-ci-openshift-hypershift-main-e2e-aws
pull-ci-openshift-hypershift-main-e2e-aws-4-21
pull-ci-openshift-hypershift-main-e2e-aws-upgrade-hypershift-operator
pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn-reduced
pull-ci-openshift-hypershift-main-images
pull-ci-openshift-hypershift-main-okd-scos-images
pull-ci-openshift-hypershift-main-security
pull-ci-openshift-hypershift-main-unit
pull-ci-openshift-hypershift-main-verify
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jhjaggars
Copy link
Contributor Author

/test e2e-aks-4-21

@csrwng
Copy link
Contributor

csrwng commented Dec 18, 2025

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, jhjaggars

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025
@jhjaggars jhjaggars marked this pull request as ready for review December 18, 2025 21:16
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2025
@openshift-ci openshift-ci bot requested review from csrwng and devguyio December 18, 2025 21:17
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

@jhjaggars: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants