Skip to content

Conversation

@wewang58
Copy link
Contributor

What this PR does / why we need it:

This PR implements cilium installation and check basic network policy scenarios test in aks env. close previous pr to make it clear.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-1137

Special notes for your reviewer:

  • Added ExternalCNIProvider flag and option to the e2e framework.
  • Implemented logic to install Cilium and skip default network validation when enabled.
  • Added utility functions ApplyYAMLFromURL and wait helpers for Deployments/DaemonSets.
  • Included Cilium-specific installation logic and network policy tests.
  • Pass log

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 31, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 31, 2025

@wewang58: This pull request references CNTRLPLANE-1137 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

This PR implements cilium installation and check basic network policy scenarios test in aks env. close previous pr to make it clear.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-1137

Special notes for your reviewer:

  • Added ExternalCNIProvider flag and option to the e2e framework.
  • Implemented logic to install Cilium and skip default network validation when enabled.
  • Added utility functions ApplyYAMLFromURL and wait helpers for Deployments/DaemonSets.
  • Included Cilium-specific installation logic and network policy tests.
  • Pass log

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 31, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 31, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 31, 2025

Walkthrough

Adds Cilium v1.14.5 manifests and e2e support: CRD, namespace, OLM deployment, RBAC, connectivity test manifests, CLI flag for external CNI, e2e test, and helper utilities to install/validate Cilium on HostedClusters.

Changes

Cohort / File(s) Summary
Cilium CRD
test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml
Adds CiliumConfig CustomResourceDefinition (group cilium.io, version v1alpha1) with OpenAPI schema, spec/status, status subresource, and preserve-unknown-fields.
Cilium core manifests
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml, ...-00001-cilium-olm-serviceaccount.yaml, ...-00002-cilium-olm-deployment.yaml, ...-00003-cilium-olm-service.yaml
Adds Namespace cilium, ServiceAccount cilium-olm, OLM Deployment (pinned images, env vars, hostNetwork), and Service mapping 443→9443.
Cilium RBAC
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-00011-*.yaml
Adds Roles, RoleBindings, ClusterRoles, ClusterRoleBindings for leader-election, operator, and cluster-level permissions (including SCC, RBAC, Cilium resources).
Connectivity manifests
test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
Adds large auto-generated connectivity test YAML: multiple Deployments, Services, and CiliumNetworkPolicy resources for comprehensive connectivity scenarios.
E2E test
test/e2e/cilium_network_policy_test.go
New test TestCiliumConnectivity to run Cilium connectivity suite; gated to Azure platform and ExternalCNIProvider == "cilium".
CLI flag & options
test/e2e/e2e_test.go, test/e2e/util/options.go
Adds exported field ExternalCNIProvider string to Options and -e2e.external-cni-provider flag; wires field into platform options.
Framework integration
test/e2e/util/hypershift_framework.go
Adds ExternalCNIProvider to PlatformAgnosticOptions; before() now conditionally installs Cilium when provider == "cilium" and worker nodes exist.
Cilium utilities
test/e2e/util/cilium.go
New comprehensive helpers: manifests, SCC/namespace creation, InstallCilium workflow, CreateCiliumConfig (unstructured CR), GetCiliumNetworkConfig, connectivity resource provisioning and cleanup.
General test helpers
test/e2e/util/util.go
Adds WaitForDeploymentAvailable, WaitForDaemonSetReady, and ApplyYAML (loads multi-doc YAML from file/URL and applies via Server-Side Apply).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wewang58
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Dec 31, 2025
@wewang58
Copy link
Contributor Author

@heliubj18 I add all templates to assets, can you help to review it, thanks

@wewang58 wewang58 marked this pull request as ready for review January 5, 2026 03:30
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026
@openshift-ci openshift-ci bot requested review from csrwng and jparrill January 5, 2026 03:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml (1)

61-64: Consider adding resource limits for completeness.

The deployment specifies resource requests but no limits. While acceptable for e2e test infrastructure, adding limits would prevent unbounded resource consumption if the operator misbehaves.

🔎 Proposed enhancement
           resources:
             requests:
               cpu: 100m
               memory: 512Mi
+            limits:
+              cpu: 500m
+              memory: 1Gi
test/e2e/util/util.go (1)

4198-4211: Consider adding a timeout for HTTP requests.

http.Get uses the default HTTP client which has no timeout. In flaky network conditions, this could cause tests to hang indefinitely.

🔎 Proposed fix to add HTTP timeout
+	httpClient := &http.Client{
+		Timeout: 30 * time.Second,
+	}
+
 	if strings.HasPrefix(pathOrURL, "http://") || strings.HasPrefix(pathOrURL, "https://") {
 		// Download YAML content
-		resp, err := http.Get(pathOrURL)
+		resp, err := httpClient.Get(pathOrURL)
 		if err != nil {
 			return fmt.Errorf("failed to download manifest from %s: %w", pathOrURL, err)
 		}
test/e2e/util/cilium.go (4)

161-161: Use CiliumNamespace constant for consistency.

The constant CiliumNamespace is defined but a hardcoded string "cilium" is used here.

🔎 Proposed fix
-		err := guestClient.Get(ctx, crclient.ObjectKey{Name: "cilium"}, ciliumNs)
+		err := guestClient.Get(ctx, crclient.ObjectKey{Name: CiliumNamespace}, ciliumNs)

320-333: Consider adding parentheses for clarity.

The condition logic is correct due to Go's operator precedence (&& before ||), but explicit parentheses would improve readability.

🔎 Proposed fix
-				if ds.Name == "cilium" || strings.HasPrefix(ds.Name, "cilium-") && !strings.Contains(ds.Name, "operator") {
+				if ds.Name == "cilium" || (strings.HasPrefix(ds.Name, "cilium-") && !strings.Contains(ds.Name, "operator")) {

470-493: Fixed sleep duration may be fragile.

Using time.Sleep with a fixed duration doesn't adapt to varying cluster conditions. Consider polling for specific success indicators if the connectivity test provides them, or document why a fixed duration is necessary.


556-558: Redundant helper function - use ptr.To instead.

The ptrBool helper is redundant since k8s.io/utils/ptr is already imported (line 28) and ptr.To[bool](false) or simply ptr.To(false) can be used.

🔎 Proposed fix

Remove ptrBool function and update the call site:

-		AllowPrivilegeEscalation: ptrBool(false),
+		AllowPrivilegeEscalation: ptr.To(false),

And remove lines 556-558:

-func ptrBool(b bool) *bool {
-	return &b
-}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c1d28e3 and 0fa952c.

📒 Files selected for processing (20)
  • test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
  • test/e2e/cilium_network_policy_test.go
  • test/e2e/e2e_test.go
  • test/e2e/util/cilium.go
  • test/e2e/util/hypershift_framework.go
  • test/e2e/util/options.go
  • test/e2e/util/util.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yaml
  • test/e2e/util/options.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml
  • test/e2e/util/hypershift_framework.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml
  • test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml
  • test/e2e/e2e_test.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml
  • test/e2e/cilium_network_policy_test.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml
  • test/e2e/util/util.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yaml
  • test/e2e/util/cilium.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml
🧬 Code graph analysis (4)
test/e2e/util/options.go (1)
api/hypershift/v1beta1/hostedcluster_types.go (6)
  • NonePlatform (1191-1191)
  • AWSPlatform (1188-1188)
  • KubevirtPlatform (1200-1200)
  • AzurePlatform (1203-1203)
  • PowerVSPlatform (1206-1206)
  • OpenStackPlatform (1209-1209)
test/e2e/util/hypershift_framework.go (5)
test/e2e/util/external_oidc.go (1)
  • ExtOIDCConfig (53-70)
api/hypershift/v1beta1/hostedcluster_types.go (3)
  • NonePlatform (1191-1191)
  • NetworkType (1166-1166)
  • Other (1179-1179)
support/util/visibility.go (1)
  • IsPrivateHC (34-44)
test/e2e/util/util.go (1)
  • WaitForGuestClient (325-373)
test/e2e/util/cilium.go (1)
  • InstallCilium (151-342)
test/e2e/cilium_network_policy_test.go (5)
api/hypershift/v1beta1/hostedcluster_types.go (1)
  • AzurePlatform (1203-1203)
test/e2e/util/hypershift_framework.go (1)
  • NewHypershiftTest (66-78)
support/azureutil/azureutil.go (1)
  • IsAroHCP (217-219)
test/e2e/util/util.go (1)
  • WaitForGuestClient (325-373)
test/e2e/util/cilium.go (1)
  • EnsureCiliumConnectivityTestResources (346-500)
test/e2e/util/cilium.go (3)
support/azureutil/azureutil.go (1)
  • IsAroHCP (217-219)
test/e2e/util/util.go (3)
  • ApplyYAML (4194-4264)
  • WaitForDeploymentAvailable (4163-4175)
  • WaitForDaemonSetReady (4178-4190)
support/util/util.go (1)
  • DeleteIfNeeded (142-144)
🪛 Checkov (3.2.334)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml

[medium] 1-75: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-75: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 1-75: Containers should not share the host network namespace

(CKV_K8S_19)

test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml

[medium] 3-58: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 3-58: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 59-115: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 59-115: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 116-180: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 116-180: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 116-180: Containers should not share the host network namespace

(CKV_K8S_19)


[medium] 181-236: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 181-236: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 237-292: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 237-292: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 293-338: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 293-338: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 339-394: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 339-394: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 395-450: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 395-450: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 451-516: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 451-516: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 517-582: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 517-582: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 583-649: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 583-649: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 583-649: Containers should not share the host network namespace

(CKV_K8S_19)


[medium] 650-716: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 650-716: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 650-716: Containers should not share the host network namespace

(CKV_K8S_19)


[medium] 717-782: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 717-782: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 783-848: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 783-848: Minimize the admission of root containers

(CKV_K8S_23)

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml

[medium] 1-82: Minimize wildcard use in Roles and ClusterRoles

(CKV_K8S_49)

🪛 golangci-lint (2.5.0)
test/e2e/util/cilium.go

[error] 228-228: : # github.com/openshift/hypershift/sync-global-pullsecret [github.com/openshift/hypershift/sync-global-pullsecret.test]
sync-global-pullsecret/sync-global-pullsecret_test.go:228:23: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:234:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:247:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:257:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:270:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:283:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:296:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:309:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:327:12: undefined: NewMockdbusConn

(typecheck)

🔇 Additional comments (26)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml (1)

1-13: LGTM!

The RoleBinding correctly binds the cilium-olm ServiceAccount to the cilium-olm Role within the cilium namespace. The manifest structure is valid and follows Kubernetes RBAC conventions.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yaml (1)

1-13: LGTM!

The RoleBinding correctly enables leader election for the cilium-olm ServiceAccount by binding it to the leader-election Role within the cilium namespace. The manifest structure is valid.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yaml (1)

1-12: LGTM! Note AI summary discrepancy.

The ClusterRoleBinding correctly references the ClusterRole cilium-cilium-olm (defined in cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml) and binds it to the cilium-olm ServiceAccount. The manifest structure is valid.

Note: The AI summary incorrectly states this binds ClusterRole "cilium-cilium", but the code correctly references "cilium-cilium-olm".

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml (1)

1-11: LGTM!

The Namespace configuration is appropriate for a system-level CNI namespace:

  • Empty node-selector allows flexible pod scheduling
  • Cluster logging and monitoring labels enable observability
  • Run-level "0" indicates this is an early-lifecycle system namespace

The manifest structure is valid and follows OpenShift conventions.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml (1)

14-28: This ClusterRole is sourced from the upstream isovalent/olm-for-cilium project and follows the expected pattern for Helm operators, which require broad RBAC permissions to orchestrate application deployment and lifecycle management. The broad CRUD permissions on RBAC resources (roles, rolebindings, clusterroles, clusterrolebindings) are necessary for cilium-olm to create and manage Cilium's own RBAC requirements during cluster setup. Since this is a test fixture derived from upstream, the scope and permissions are already vetted by the Cilium project.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml (1)

1-7: LGTM!

The ServiceAccount manifest follows standard Kubernetes conventions and is correctly structured for use by the cilium-olm deployment.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yaml (1)

1-14: LGTM!

The Service manifest is correctly structured with appropriate port mapping (443→9443) for the cilium-olm component. The selector properly targets pods with the name: cilium-olm label.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yaml (1)

1-24: LGTM!

The leader election Role follows standard Kubernetes patterns with appropriate permissions for managing ConfigMap-based leader election leases and creating events.

test/e2e/util/options.go (2)

94-95: LGTM!

The ExternalCNIProvider field is properly documented and follows the existing pattern for optional feature flags in the Options struct.


206-212: LGTM!

The ExternalCNIProvider field is correctly propagated to PlatformAgnosticOptions, maintaining consistency with how other platform-specific options are handled in DefaultClusterOptions.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml (1)

1-82: LGTM - Standard Cilium operator RBAC.

The wildcard permissions on cilium.io resources and CRDs are expected for a CNI operator that needs to manage its own custom resources. The remaining rules are appropriately scoped. This aligns with upstream Cilium installation patterns and is acceptable for e2e test infrastructure.

test/e2e/cilium_network_policy_test.go (1)

27-52: LGTM - Well-structured test with proper skip conditions and cleanup.

The test correctly:

  • Runs in parallel
  • Skips appropriately when not on Azure platform or when Cilium isn't the external CNI provider
  • Performs ARO HCP environment check inside the test harness
  • Defers cleanup of connectivity test resources
test/e2e/util/hypershift_framework.go (1)

51-52: LGTM - New fields extend platform options appropriately.

The ExternalCNIProvider field follows the existing pattern established by ExtOIDCConfig.

test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml (1)

1-41: LGTM - Standard CiliumConfig CRD definition.

The CRD follows proper structure with appropriate use of x-kubernetes-preserve-unknown-fields for the operator-managed spec and status fields.

test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml (2)

1-57: Note: Unusual YAML document ordering is valid but unconventional.

The apiVersion and kind fields appear at the end of each document rather than the beginning. While valid YAML (field order doesn't matter), this is atypical and may confuse readers. Since this is marked as auto-generated from Cilium upstream, modifying it may complicate future updates.


116-180: Host networking is intentional for connectivity test scenarios.

The hostNetwork: true in echo-b-host and similar deployments is required for testing host-to-pod and cross-node connectivity scenarios. The static analysis warnings about hostNetwork and security contexts can be safely ignored for these e2e test assets.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml (1)

1-12: LGTM - Properly binds ClusterRole to ServiceAccount.

The binding correctly references the cilium-cilium ClusterRole and the cilium-olm ServiceAccount in the cilium namespace, consistent with other manifests in this PR.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml (1)

68-72: Host networking and broad tolerations are appropriate for CNI operators.

The hostNetwork: true is required for CNI operators to configure the node's network stack. The broad toleration (operator: Exists) ensures the operator can run on control plane nodes if needed. Static analysis warnings can be safely ignored for this use case.

test/e2e/e2e_test.go (1)

93-93: LGTM!

The new CLI flag follows the established pattern for e2e test options and integrates cleanly with the existing flag definitions.

test/e2e/util/util.go (3)

4162-4175: LGTM!

The helper function follows the established gomega Eventually pattern used elsewhere in this file and provides a clean interface for waiting on deployment readiness.


4177-4190: LGTM!

The helper function correctly implements the wait pattern for DaemonSet readiness, consistent with WaitForDeploymentAvailable.


4220-4264: LGTM!

The YAML decoding and Server-Side Apply logic is well-implemented with proper handling of multi-document YAMLs, namespace scoping via RESTMapper, and clear error messages.

test/e2e/util/cilium.go (4)

33-63: LGTM!

Constants are well-organized with clear naming and documentation. Reading CiliumVersion from environment allows flexibility in CI configurations.


65-147: LGTM!

The manifest helper functions are well-structured with appropriate PodSecurity labels and SCC capabilities for Cilium. The local/remote manifest fallback pattern provides good flexibility for testing.


502-528: LGTM!

Good defensive coding with sensible defaults and graceful error handling. The IPv4 detection logic via strings.Contains(clusterNet.CIDR, ".") is simple and effective.


560-629: LGTM!

The CreateCiliumConfig function creates a comprehensive Cilium configuration with appropriate settings for OpenShift integration. Debug mode being enabled is reasonable for e2e test observability.

Comment on lines +39 to +66
- apiGroups:
- ""
resources:
- secrets
verbs:
- '*'
- apiGroups:
- ""
resources:
- serviceaccounts
- configmaps
- secrets
- services
verbs:
- '*'
- apiGroups:
- apps
resources:
- deployments
- daemonsets
verbs:
- '*'
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
verbs:
- '*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Review the use of wildcard RBAC permissions.

Multiple rules use wildcard verbs (*), granting full control over secrets, serviceaccounts, configmaps, services, deployments, daemonsets, and servicemonitors within the cilium namespace. While operators typically require elevated permissions, wildcards are overly permissive and violate the principle of least privilege.

Consider explicitly listing only the required verbs (e.g., get, list, watch, create, update, patch, delete) rather than using *. This reduces the attack surface if the operator is compromised.

Note: secrets appears in two separate rules (lines 42 and 50), which is redundant.

🤖 Prompt for AI Agents
In
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml
around lines 39–66, several RBAC rules use wildcard verbs ("*") and secrets is
duplicated; replace each "*" with the minimal explicit verbs the operator
actually needs (for example get, list, watch, create, update, patch, delete as
applicable per resource) and remove the duplicate secrets entry by consolidating
permissions into a single rule for secrets; ensure each apiGroup/resources block
only lists the precise verbs required for those resources to follow
least-privilege.

Comment on lines +150 to +155
guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster)
InstallCilium(t, context.Background(), guestClient, hostedCluster)
// wait hosted cluster ready
WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform)
WaitForImageRollout(t, context.Background(), h.client, hostedCluster)
ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use h.ctx instead of context.Background() for proper cancellation propagation.

The test framework already has h.ctx which is properly configured with cancellation. Using context.Background() here means these operations won't respect test cancellation or timeouts, potentially leaving resources orphaned if the test is aborted.

🔎 Proposed fix
 			if opts.NodePoolReplicas == 0 {
 				t.Fatal("NodePool replicas must be positive for Cilium to install.")
 			}
-			guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster)
-			InstallCilium(t, context.Background(), guestClient, hostedCluster)
+			guestClient := WaitForGuestClient(t, h.ctx, h.client, hostedCluster)
+			InstallCilium(t, h.ctx, guestClient, hostedCluster)
 			// wait hosted cluster ready
-			WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform)
-			WaitForImageRollout(t, context.Background(), h.client, hostedCluster)
-			ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute)
+			WaitForNReadyNodes(t, h.ctx, guestClient, opts.NodePoolReplicas, platform)
+			WaitForImageRollout(t, h.ctx, h.client, hostedCluster)
+			ValidateHostedClusterConditions(t, h.ctx, h.client, hostedCluster, true, 10*time.Minute)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster)
InstallCilium(t, context.Background(), guestClient, hostedCluster)
// wait hosted cluster ready
WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform)
WaitForImageRollout(t, context.Background(), h.client, hostedCluster)
ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute)
guestClient := WaitForGuestClient(t, h.ctx, h.client, hostedCluster)
InstallCilium(t, h.ctx, guestClient, hostedCluster)
// wait hosted cluster ready
WaitForNReadyNodes(t, h.ctx, guestClient, opts.NodePoolReplicas, platform)
WaitForImageRollout(t, h.ctx, h.client, hostedCluster)
ValidateHostedClusterConditions(t, h.ctx, h.client, hostedCluster, true, 10*time.Minute)
🤖 Prompt for AI Agents
In test/e2e/util/hypershift_framework.go around lines 150 to 155, several calls
use context.Background() which bypasses the framework's cancellation; replace
each context.Background() with h.ctx (i.e., use h.ctx for WaitForGuestClient,
InstallCilium, WaitForNReadyNodes, WaitForImageRollout, and
ValidateHostedClusterConditions) so the operations inherit the test's context
and proper cancellation/timeouts.

Why:
To enable e2e testing of HyperShift clusters using Cilium as the CNI provider.

How:
- Added `ExternalCNIProvider` flag and option to the e2e framework.
- Implemented logic to install Cilium and skip default network validation when enabled.
- Added utility functions `ApplyYAMLFromURL` and wait helpers for Deployments/DaemonSets.
- Included Cilium-specific installation logic and network policy tests.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @test/e2e/util/util.go:
- Around line 4198-4211: The HTTP download uses the default client (http.Get)
without a timeout which can hang; replace the http.Get(pathOrURL) call with a
dedicated http.Client having a reasonable Timeout (e.g., 10s) and call
client.Get(pathOrURL) instead, then keep the existing resp.Body.Close, status
code check and io.ReadAll handling; locate the download block around the
http.Get usage (the pathOrURL/http.Get/yamlContent section) and swap in the
timeout-capable client to ensure the request fails fast on unresponsive hosts.
♻️ Duplicate comments (1)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml (1)

39-66: Wildcard RBAC and duplicate secrets entry already flagged.

This concern was raised in a previous review. The wildcard verbs and duplicate secrets resource (Lines 42 and 50) should be addressed per the earlier feedback.

🧹 Nitpick comments (7)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml (1)

17-72: Consider adding a security context to restrict privilege escalation.

The deployment lacks a securityContext to explicitly disable privilege escalation. While hostNetwork: true and the broad toleration are necessary for CNI operators, you can still harden the container:

      containers:
        - name: operator
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true

Since this is test infrastructure for e2e validation, this is optional but would align with security best practices flagged by static analysis.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml (1)

5-17: Wildcard permissions on CRDs are broader than necessary.

The wildcard verbs (*) on cilium.io/* resources are acceptable for the Cilium operator. However, the wildcard on customresourcedefinitions (Lines 12-17) grants ability to create/delete any CRD in the cluster, not just Cilium's.

For test infrastructure this is acceptable, but in production environments consider restricting to specific CRD names or limiting verbs to what's actually needed (e.g., get, list, watch, create, update, patch).

test/e2e/util/util.go (2)

4162-4174: Consider strengthening the readiness check.

The current check only verifies that ReadyReplicas > 0, which may not ensure the Deployment is fully ready. If the Deployment has multiple replicas, this could return success prematurely.

🔎 Proposed improvement
 func WaitForDeploymentAvailable(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) {
 	g := NewWithT(t)
 	t.Logf("Waiting for deployment %s/%s to be ready", namespace, name)
 	g.Eventually(func() bool {
 		deployment := &appsv1.Deployment{}
 		err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, deployment)
 		if err != nil {
 			t.Logf("Failed to get deployment %s/%s: %v", namespace, name, err)
 			return false
 		}
-		return deployment.Status.ReadyReplicas > 0
+		// Check that all desired replicas are ready
+		return deployment.Status.ReadyReplicas > 0 && 
+		       deployment.Status.ReadyReplicas == deployment.Status.Replicas &&
+		       deployment.Status.ObservedGeneration >= deployment.Generation
 	}, timeout, interval).Should(BeTrue(), fmt.Sprintf("deployment %s/%s should be ready", namespace, name))
 }

4177-4189: Consider strengthening the DaemonSet readiness check.

The current check only verifies that NumberReady > 0, which doesn't ensure the DaemonSet has pods ready on all desired nodes.

🔎 Proposed improvement
 func WaitForDaemonSetReady(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) {
 	g := NewWithT(t)
 	t.Logf("Waiting for DaemonSet %s/%s to be ready", namespace, name)
 	g.Eventually(func() bool {
 		ds := &appsv1.DaemonSet{}
 		err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, ds)
 		if err != nil {
 			t.Logf("Failed to get DaemonSet %s/%s: %v", namespace, name, err)
 			return false
 		}
-		return ds.Status.NumberReady > 0
+		// Check that all desired pods are ready and the rollout is complete
+		return ds.Status.NumberReady > 0 &&
+		       ds.Status.NumberReady == ds.Status.DesiredNumberScheduled &&
+		       ds.Status.ObservedGeneration >= ds.Generation
 	}, timeout, interval).Should(BeTrue(), fmt.Sprintf("DaemonSet %s/%s should be ready", namespace, name))
 }
test/e2e/util/cilium.go (3)

259-307: Complex Eventually block with nested updates - consider simplifying.

The Eventually block handles the case where the Cilium operator might overwrite the CiliumConfig, but the nested update logic makes it harder to debug failures. Consider extracting this to a helper function or adding more structured logging.

This is not blocking, but could improve maintainability for debugging test failures.


470-493: Replace fixed sleep with polling for test completion.

The current implementation waits a fixed 60 seconds (line 474) and then only checks if pods are still running. This approach has limitations:

  1. Fixed waits are brittle - tests might pass/fail based on timing
  2. Checking only pod phase misses container-level failures (CrashLoopBackOff, errors)
  3. Could result in false positives if connectivity tests are still running

Consider polling for test completion status instead (e.g., checking container termination status or using a Job-based approach) and examining container logs for test results.


556-558: Use existing pointer helper instead of custom function.

The repository already uses k8s.io/utils/ptr (imported as ptr in other files). Consider using ptr.To[bool](b) instead of maintaining a custom ptrBool function.

🔎 Proposed change
-func ptrBool(b bool) *bool {
-	return &b
-}

And update the usage at line 386:

-			AllowPrivilegeEscalation: ptrBool(false),
+			AllowPrivilegeEscalation: ptr.To(false),
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0fa952c and 609cbef.

📒 Files selected for processing (20)
  • test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
  • test/e2e/cilium_network_policy_test.go
  • test/e2e/e2e_test.go
  • test/e2e/util/cilium.go
  • test/e2e/util/hypershift_framework.go
  • test/e2e/util/options.go
  • test/e2e/util/util.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yaml
  • test/e2e/e2e_test.go
  • test/e2e/util/options.go
  • test/e2e/util/hypershift_framework.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/cilium_network_policy_test.go
  • test/e2e/util/util.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml
  • test/e2e/util/cilium.go
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml
  • test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml
  • test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
🧬 Code graph analysis (1)
test/e2e/util/cilium.go (2)
support/azureutil/azureutil.go (1)
  • IsAroHCP (217-219)
test/e2e/util/util.go (3)
  • ApplyYAML (4194-4264)
  • WaitForDeploymentAvailable (4163-4175)
  • WaitForDaemonSetReady (4178-4190)
🪛 Checkov (3.2.334)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml

[medium] 1-75: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-75: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 1-75: Containers should not share the host network namespace

(CKV_K8S_19)

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml

[medium] 1-82: Minimize wildcard use in Roles and ClusterRoles

(CKV_K8S_49)

test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml

[medium] 3-58: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 3-58: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 59-115: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 59-115: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 116-180: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 116-180: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 116-180: Containers should not share the host network namespace

(CKV_K8S_19)


[medium] 181-236: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 181-236: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 237-292: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 237-292: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 293-338: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 293-338: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 339-394: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 339-394: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 395-450: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 395-450: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 451-516: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 451-516: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 517-582: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 517-582: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 583-649: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 583-649: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 583-649: Containers should not share the host network namespace

(CKV_K8S_19)


[medium] 650-716: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 650-716: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 650-716: Containers should not share the host network namespace

(CKV_K8S_19)


[medium] 717-782: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 717-782: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 783-848: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 783-848: Minimize the admission of root containers

(CKV_K8S_23)

🪛 golangci-lint (2.5.0)
test/e2e/util/cilium.go

[error] 228-228: : # github.com/openshift/hypershift/sync-global-pullsecret [github.com/openshift/hypershift/sync-global-pullsecret.test]
sync-global-pullsecret/sync-global-pullsecret_test.go:228:23: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:234:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:247:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:257:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:270:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:283:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:296:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:309:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:327:12: undefined: NewMockdbusConn

(typecheck)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
  • GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (7)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml (1)

1-13: LGTM!

Standard RoleBinding correctly binding the cilium-olm Role to the cilium-olm ServiceAccount in the cilium namespace.

test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml (1)

1-41: LGTM!

Well-structured CRD definition for CiliumConfig with appropriate x-kubernetes-preserve-unknown-fields for flexible operator configuration and status subresource enabled.

test/e2e/cilium_network_policy_test.go (2)

43-51: LGTM!

Good test structure with proper resource cleanup via defer. The nested ARO HCP check correctly differentiates between generic Azure and ARO HCP-specific requirements.


17-25: No changes needed. The comment accurately describes the test implementation. EnsureCiliumConnectivityTestResources internally checks Cilium pod readiness as its first step via the CheckCiliumPodsRunning subtest before proceeding with subsequent setup operations. The documented steps align with the actual execution flow.

test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml (1)

1-2: Auto-generated test manifest - static analysis findings are expected.

This is a standard Cilium connectivity test suite auto-generated by Makefile. The hostNetwork: true configurations and lack of explicit security contexts are intentional for testing various networking scenarios (host-to-pod, cross-node, nodeport access). The static analysis warnings about privilege escalation and host networking are false positives in this context.

test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml (1)

1-12: LGTM!

Standard ClusterRoleBinding manifest that correctly binds the cilium-cilium ClusterRole to the cilium-olm ServiceAccount in the cilium namespace.

test/e2e/util/cilium.go (1)

81-114: LGTM - Appropriately permissive SCC for CNI workload.

The SecurityContextConstraints are correctly configured for a CNI provider like Cilium, which requires elevated privileges (host network, privileged containers, etc.). The permissions are scoped to specific service accounts in the cilium namespace.

Comment on lines +4198 to +4211
if strings.HasPrefix(pathOrURL, "http://") || strings.HasPrefix(pathOrURL, "https://") {
// Download YAML content
resp, err := http.Get(pathOrURL)
if err != nil {
return fmt.Errorf("failed to download manifest from %s: %w", pathOrURL, err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("failed to download manifest from %s: HTTP %d", pathOrURL, resp.StatusCode)
}
yamlContent, err = io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read manifest content: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to HTTP client for reliability.

The http.Get call at line 4200 uses the default HTTP client with no timeout, which could hang indefinitely if the server is unresponsive. This is particularly problematic in CI/CD environments or when network conditions are poor.

🔎 Proposed fix
 	if strings.HasPrefix(pathOrURL, "http://") || strings.HasPrefix(pathOrURL, "https://") {
 		// Download YAML content
-		resp, err := http.Get(pathOrURL)
+		client := &http.Client{
+			Timeout: 30 * time.Second,
+		}
+		resp, err := client.Get(pathOrURL)
 		if err != nil {
 			return fmt.Errorf("failed to download manifest from %s: %w", pathOrURL, err)
 		}
 		defer resp.Body.Close()
 		if resp.StatusCode != http.StatusOK {
 			return fmt.Errorf("failed to download manifest from %s: HTTP %d", pathOrURL, resp.StatusCode)
 		}
 		yamlContent, err = io.ReadAll(resp.Body)
 		if err != nil {
 			return fmt.Errorf("failed to read manifest content: %w", err)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if strings.HasPrefix(pathOrURL, "http://") || strings.HasPrefix(pathOrURL, "https://") {
// Download YAML content
resp, err := http.Get(pathOrURL)
if err != nil {
return fmt.Errorf("failed to download manifest from %s: %w", pathOrURL, err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("failed to download manifest from %s: HTTP %d", pathOrURL, resp.StatusCode)
}
yamlContent, err = io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read manifest content: %w", err)
}
if strings.HasPrefix(pathOrURL, "http://") || strings.HasPrefix(pathOrURL, "https://") {
// Download YAML content
client := &http.Client{
Timeout: 30 * time.Second,
}
resp, err := client.Get(pathOrURL)
if err != nil {
return fmt.Errorf("failed to download manifest from %s: %w", pathOrURL, err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("failed to download manifest from %s: HTTP %d", pathOrURL, resp.StatusCode)
}
yamlContent, err = io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read manifest content: %w", err)
}
🤖 Prompt for AI Agents
In @test/e2e/util/util.go around lines 4198-4211, The HTTP download uses the
default client (http.Get) without a timeout which can hang; replace the
http.Get(pathOrURL) call with a dedicated http.Client having a reasonable
Timeout (e.g., 10s) and call client.Get(pathOrURL) instead, then keep the
existing resp.Body.Close, status code check and io.ReadAll handling; locate the
download block around the http.Get usage (the pathOrURL/http.Get/yamlContent
section) and swap in the timeout-capable client to ensure the request fails fast
on unresponsive hosts.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 5, 2026

@wewang58: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aks-4-21 609cbef link true /test e2e-aks-4-21

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wewang58
Copy link
Contributor Author

wewang58 commented Jan 6, 2026

@jparrill @mgencur @heliubj18 Submitted a new pr to make it easy to review, it's almost the same with the old pr, just move some templates to assets directory, can you review again, thanks.

@mgencur
Copy link
Contributor

mgencur commented Jan 8, 2026

The AKS test fails with:

failed to wait for DaemonSet ovnkube-node to be ready: context deadline exceeded

(waits for OVN but this is Cilium)

Copy link
Contributor

@mgencur mgencur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wewang58 , a few comments here and there. Thanks

t.Skip("Skipping test because it requires Azure platform")
}

if globalOpts.ExternalCNIProvider != "cilium" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract "cilium" as const and re-use in the other places? It's used a few times in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgencur Thanks reviewing, i will update others and test it.

flag.BoolVar(&globalOpts.HOInstallationOptions.EnableCIDebugOutput, "e2e.ho-enable-ci-debug-output", false, "Install the HyperShift Operator with extra CI debug output enabled. This is a HyperShift Operator installation option")
flag.StringVar(&globalOpts.HOInstallationOptions.PlatformMonitoring, "e2e.platform-monitoring", "All", "The option for enabling platform cluster monitoring when installing the HyperShift Operator. Valid values are: None, OperatorOnly, All. This is a HyperShift Operator installation option")
flag.BoolVar(&globalOpts.RunUpgradeTest, "upgrade.run-tests", false, "Run HyperShift Operator upgrade test")
flag.StringVar(&globalOpts.ExternalCNIProvider, "e2e.external-cni-provider", "", "the option supported third cni provider: cilium, calico")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to mention "calico" here? Maybe something like this would be better: The option supports the following CNI providers: cilium
...and later we could add more if necessary.

CiliumDefaultTimeout = 10 * time.Minute
CiliumLongTimeout = 20 * time.Minute
CiliumShortTimeout = 2 * time.Minute
CiliumDefaultPollInterval = 10 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other PR, @jparrill was requesting those variables to be package private (AFAIK). #7077 (comment)
I'm just bring this here to speed up the loop.

if _, err := os.Stat(localPath); err == nil {
manifests = append(manifests, localPath)
} else {
manifests = append(manifests, fmt.Sprintf("%s/%s", remoteBase, f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you included the cilium yaml files in this PR to prevent downloading the files every time (e.g. for disconnected envs). So, we have the files locally now, so why this backup for downloading the files?
It is unlikely that some files would be accidentally removed from the repo.

func (h *hypershiftTest) before(hostedCluster *hyperv1.HostedCluster, opts *PlatformAgnosticOptions, platform hyperv1.PlatformType) {
h.Run("ValidateHostedCluster", func(t *testing.T) {
if platform != hyperv1.NonePlatform {
if platform != hyperv1.NonePlatform && hostedCluster.Spec.Networking.NetworkType != hyperv1.Other {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a related comment on the PR that will trigger the test suite: https://github.com/openshift/release/pull/71383/files#r2672253127 . PTAL. I think it's important to set --network-type=Other there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if EXTERNAL_CNI_PROVIDER is not null, will add --e2e.external-cni-provider and --network-type=Other in https://github.com/openshift/release/pull/71383/files#diff-36e084ed3de5a17840678199ccb286a3ff6b2e47b9b6ebbbf011a26e43932f5cL106 right? i will update that pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This is what I mean.

@wewang58
Copy link
Contributor Author

wewang58 commented Jan 9, 2026

The AKS test fails with:

failed to wait for DaemonSet ovnkube-node to be ready: context deadline exceeded

(waits for OVN but this is Cilium)

It should be not related to my pr, though I add Cilium code, acutually, it does not actually installed cilium cni, you can see test log did not add --e2e.external-cni-provider="cilium" in test run

I have ci pr: openshift/release#71383 is dependent on the pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants