test(e2e): fix AKS e2e test flakiness #7410

jhjaggars · 2025-12-18T19:09:44Z

Summary

This PR fixes two sources of flakiness in AKS e2e tests identified from failures in PR #6975:

Job e2e-aks-4-20
Job e2e-aks-4-21

Changes

1. Add network-node-identity to podCrashTolerations

File: test/e2e/util/util.go

The network-node-identity webhook container can restart once during cluster initialization due to startup timing issues. This change adds it to the crash tolerations map with a tolerance of 1 restart.

Previous behavior: Test fails immediately on any container restart
New behavior: Test tolerates 1 restart for network-node-identity

2. Fix TestOnCreateAPIUX race condition

File: test/e2e/create_cluster_test.go

The test was using a hardcoded name "base" for all HostedCluster resources in sequential test iterations. When the delete from iteration N didn't complete before iteration N+1 started, the create would fail with "already exists" error.

Previous behavior: All test iterations use hardcoded name "base", causing race conditions
New behavior: Each test iteration generates a unique name using timestamp and index

Test Plan

make staticcheck passes
make verify passes
AKS e2e tests pass in CI

Notes

These are pre-existing flakiness issues unrelated to the scale-from-zero functionality in PR #6975. The failures were environmental/test infrastructure issues rather than product bugs.

Fix two sources of test flakiness in AKS e2e tests: 1. Add network-node-identity to podCrashTolerations map to allow one restart for webhook startup timing issues 2. Generate unique names for HostedCluster resources in TestOnCreateAPIUX to prevent "already exists" race condition from sequential test iterations Signed-off-by: Jesse Jaggars <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)

openshift-ci · 2025-12-18T19:09:50Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2025-12-18T19:09:52Z

Walkthrough

Two test utility files were modified: unique HostedCluster names are generated by appending loop index and nanosecond timestamp to avoid race conditions during cluster creation tests, and a new pod crash toleration entry for network-node-identity webhook startup timing was added to the crash tolerations map.

Changes

Cohort / File(s)	Summary
Test cluster creation naming `test/e2e/create_cluster_test.go`	Introduces loop index variable for tc.validations loop to generate unique HostedCluster.Name by appending loop index and nanosecond timestamp, preventing race conditions.
Pod crash tolerations configuration `test/e2e/util/util.go`	Adds "network-node-identity": 1 entry to podCrashTolerations map with comment explaining one restart allowance for webhook startup timing.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Verify index variable is correctly used in the loop for generating unique names
Confirm nanosecond timestamp generation prevents naming collisions
Check that the new pod crash toleration entry (network-node-identity) is correctly positioned in the map

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c33b48b and aea607e.

📒 Files selected for processing (2)

test/e2e/create_cluster_test.go (1 hunks)
test/e2e/util/util.go (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

test/e2e/util/util.go
test/e2e/create_cluster_test.go

🧬 Code graph analysis (1)

test/e2e/create_cluster_test.go (3)

support/assets/readasset.go (1)

ShouldHostedCluster (81-85)

karpenter-operator/controllers/karpenter/assets/assets.go (1)

ReadFile (23-25)

hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go (1)

Now (77-77)

🔇 Additional comments (2)

test/e2e/util/util.go (1)

127-128: New crash toleration for network-node-identity is consistent and scoped

Key name matches other usages in this file, and allowing a single restart integrates cleanly with EnsureNoCrashingPods without broadening tolerances for other components.

test/e2e/create_cluster_test.go (1)

1693-1699: Unique HostedCluster names correctly eliminate create-time name collisions

Using the loop index plus a nanosecond timestamp produces short, DNS-compliant, unique names per validation run and addresses the “already exists” race without affecting the rest of the admission-testing logic.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jhjaggars · 2025-12-18T19:11:34Z

/test ?

openshift-ci · 2025-12-18T19:11:38Z

@jhjaggars: The following commands are available to trigger required jobs:

/test e2e-aks

/test e2e-aks-4-21

/test e2e-aks-override

/test e2e-aws

/test e2e-aws-4-21

/test e2e-aws-override

/test e2e-aws-upgrade-hypershift-operator

/test e2e-kubevirt-aws-ovn-reduced

/test images

/test okd-scos-images

/test security

/test unit

/test verify

/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-aws-autonode

/test e2e-aws-metrics

/test e2e-aws-minimal

/test e2e-aws-techpreview

/test e2e-azure-aks-ovn-conformance

/test e2e-conformance

/test e2e-kubevirt-aws-ovn

/test e2e-kubevirt-azure-ovn

/test e2e-kubevirt-metal-conformance

/test e2e-openstack-aws

/test e2e-openstack-aws-conformance

/test e2e-openstack-aws-csi-cinder

/test e2e-openstack-aws-csi-manila

/test e2e-openstack-aws-nfv

/test okd-scos-e2e-aws-ovn

/test reqserving-e2e-aws

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-hypershift-main-e2e-aks

pull-ci-openshift-hypershift-main-e2e-aks-4-21

pull-ci-openshift-hypershift-main-e2e-aws

pull-ci-openshift-hypershift-main-e2e-aws-4-21

pull-ci-openshift-hypershift-main-e2e-aws-upgrade-hypershift-operator

pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn-reduced

pull-ci-openshift-hypershift-main-images

pull-ci-openshift-hypershift-main-okd-scos-images

pull-ci-openshift-hypershift-main-security

pull-ci-openshift-hypershift-main-unit

pull-ci-openshift-hypershift-main-verify

Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jhjaggars · 2025-12-18T19:12:02Z

/test e2e-aks-4-21

csrwng · 2025-12-18T21:16:00Z

/approve

openshift-ci · 2025-12-18T21:16:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, jhjaggars

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [csrwng]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-12-18T23:10:30Z

@jhjaggars: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2025

openshift-ci bot added do-not-merge/needs-area area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Dec 18, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025

jhjaggars marked this pull request as ready for review December 18, 2025 21:16

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2025

openshift-ci bot requested review from csrwng and devguyio December 18, 2025 21:17

mtulio mentioned this pull request Dec 19, 2025

OCPBUGS-65885: added 1.34 required ELBv2 perms to CCM role #7339

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(e2e): fix AKS e2e test flakiness #7410

test(e2e): fix AKS e2e test flakiness #7410

Uh oh!

jhjaggars commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

coderabbitai bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

jhjaggars commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

jhjaggars commented Dec 18, 2025

Uh oh!

csrwng commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

test(e2e): fix AKS e2e test flakiness #7410

Are you sure you want to change the base?

test(e2e): fix AKS e2e test flakiness #7410

Uh oh!

Conversation

jhjaggars commented Dec 18, 2025

Summary

Changes

1. Add network-node-identity to podCrashTolerations

2. Fix TestOnCreateAPIUX race condition

Test Plan

Notes

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

coderabbitai bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

jhjaggars commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

jhjaggars commented Dec 18, 2025

Uh oh!

csrwng commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 18, 2025 •

edited

Loading