Skip to content

Conversation

@Chee-Lu
Copy link

@Chee-Lu Chee-Lu commented Dec 16, 2025

What this PR does:

When --rhobs-monitoring=true is set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.

The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:

  • RHOBS stack (ROSA HCP): http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090
  • CoreOS stack (Self-managed HyperShift on OpenShift): https://thanos-querier.openshift-monitoring.svc:9092

Key changes:

  • CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set
  • Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration
  • Flag description updated to document automatic CVO metrics access behavior
  • Flags remain mutually exclusive to prevent misconfiguration

Which issue(s) this PR fixes:

fixes https://issues.redhat.com//browse/OCM-10395
fixes https://issues.redhat.com//browse/OCM-20970

Special notes for your reviewer:

Backport Requirements

This change should be backported to 4.20.z, 4.21.z as well to benefit customers on that version.

When --rhobs-monitoring=true is set (for ROSA HCP), automatically enable
CVO access to OBO Prometheus for conditional update risk evaluation. The
CVO deployment logic routes to different metrics endpoints based on the
monitoring stack:

- RHOBS stack (ROSA HCP): http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090
- CoreOS stack (Self-managed HyperShift on OpenShift): https://thanos-querier.openshift-monitoring.svc:9092

Key changes:
- CVO deployment enables metrics access when either --rhobs-monitoring
  (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access
  (for self-managed HyperShift on OpenShift) is set
- Network policies updated to allow egress to the appropriate monitoring
  endpoint based on stack configuration
- Flag description updated to document automatic CVO metrics access behavior
- Flags remain mutually exclusive to prevent misconfiguration
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 16, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2025

@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What this PR does:

When --rhobs-monitoring=true is set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.

The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:

Key changes:

  • CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set
  • Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration
  • Flag description updated to document automatic CVO metrics access behavior
  • Flags remain mutually exclusive to prevent misconfiguration

Which issue(s) this PR fixes:

fixes https://issues.redhat.com/browse/OCM-10395
fixes https://issues.redhat.com/browse/OCM-20970

Special notes for your reviewer:

Backport Requirements

This change should be backported to 4.21 to benefit customers upgrading to that version. A corresponding OCPBUGS ticket will be created to track the backport. Please let me know if I should do it and if there is any guidance about that.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes add RHOBS (Red Hat Observability Services) monitoring support by introducing environment variable-driven logic to conditionally enable metrics access across installation, CVO deployment, and network policy components. This includes documentation updates, conditional metrics configuration with endpoint URL selection, and dynamic network policy rule generation based on the monitoring stack.

Changes

Cohort / File(s) Summary
Documentation and flag description
cmd/install/install.go
Updated --rhobs-monitoring flag description to document that it also enables CVO access to OBO Prometheus for conditional update risk evaluation.
CVO deployment metrics configuration
control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go
Introduced environment-variable-driven logic to conditionally enable CVO metrics access based on RHOBS monitoring status or explicit flag. Implements dual metrics endpoint paths: HTTP URL to RHOBS Prometheus when RHOBS is active, HTTPS URL to Thanos querier with TLS CA bundle otherwise. Conditionally appends metrics-related labels, service account usage, and container arguments.
Network policies for metrics access
hypershift-operator/controllers/hostedcluster/network_policies.go
Reworked reconcileMetricsServerNetworkPolicy to generate dynamic egress rules based on monitoring stack: port 9090 toward RHOBS Prometheus when active, port 9092 toward Thanos Querier when inactive. Conditional policy reconciliation tied to RHOBS monitoring environment variable or explicit flag.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Metrics endpoint URL selection logic: Verify correct endpoint routing for both RHOBS and Thanos paths, including TLS certificate handling differences
  • Network policy egress rule configuration: Confirm port assignments (9090 vs 9092) and namespace references are correct for each monitoring stack
  • Environment variable consumption patterns: Ensure consistent environment variable naming and check behavior across all three files
  • Conditional logic branches: Review all branches of the RHOBS monitoring condition to ensure complete coverage and no missing edge cases

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Dec 16, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2025

@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What this PR does:

When --rhobs-monitoring=true is set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.

The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:

Key changes:

  • CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set
  • Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration
  • Flag description updated to document automatic CVO metrics access behavior
  • Flags remain mutually exclusive to prevent misconfiguration

Which issue(s) this PR fixes:

fixes https://issues.redhat.com/browse/OCM-10395
fixes https://issues.redhat.com/browse/OCM-20970

Special notes for your reviewer:

Backport Requirements

This change should be backported to 4.21 to benefit customers upgrading to that version. A corresponding OCPBUGS ticket will be created to track the backport. Please let me know if I should do it and if there is any guidance about that.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2025

@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What this PR does:

When --rhobs-monitoring=true is set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.

The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:

Key changes:

  • CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set
  • Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration
  • Flag description updated to document automatic CVO metrics access behavior
  • Flags remain mutually exclusive to prevent misconfiguration

Which issue(s) this PR fixes:

fixes https://issues.redhat.com//browse/OCM-10395
fixes https://issues.redhat.com//browse/OCM-20970

Special notes for your reviewer:

Backport Requirements

This change should be backported to 4.21 to benefit customers upgrading to that version. A corresponding OCPBUGS ticket will be created to track the backport. Please let me know if I should do it and if there is any guidance about that.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Chee-Lu
Copy link
Author

Chee-Lu commented Dec 16, 2025

/auto-cc

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 16, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Chee-Lu
Once this PR has been reviewed and has the lgtm label, please assign muraee for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the area/cli Indicates the PR includes changes for CLI label Dec 16, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

Hi @Chee-Lu. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Dec 16, 2025
@Chee-Lu Chee-Lu changed the title OCM-10395: enable CVO metrics access with RHOBS monitoring flag OCM-10395: feat(monitoring): enable CVO metrics access with RHOBS monitoring flag Dec 16, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2025

@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What this PR does:

When --rhobs-monitoring=true is set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.

The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:

  • RHOBS stack (ROSA HCP): http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090
  • CoreOS stack (Self-managed HyperShift on OpenShift): https://thanos-querier.openshift-monitoring.svc:9092

Key changes:

  • CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set
  • Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration
  • Flag description updated to document automatic CVO metrics access behavior
  • Flags remain mutually exclusive to prevent misconfiguration

Which issue(s) this PR fixes:

fixes https://issues.redhat.com//browse/OCM-10395
fixes https://issues.redhat.com//browse/OCM-20970

Special notes for your reviewer:

Backport Requirements

This change should be backported to 4.21 to benefit customers upgrading to that version. A corresponding OCPBUGS ticket will be created to track the backport. Please let me know if I should do it and if there is any guidance about that.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from enxebre and muraee December 16, 2025 16:52
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2025

@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What this PR does:

When --rhobs-monitoring=true is set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.

The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:

  • RHOBS stack (ROSA HCP): http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090
  • CoreOS stack (Self-managed HyperShift on OpenShift): https://thanos-querier.openshift-monitoring.svc:9092

Key changes:

  • CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set
  • Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration
  • Flag description updated to document automatic CVO metrics access behavior
  • Flags remain mutually exclusive to prevent misconfiguration

Which issue(s) this PR fixes:

fixes https://issues.redhat.com//browse/OCM-10395
fixes https://issues.redhat.com//browse/OCM-20970

Special notes for your reviewer:

Backport Requirements

This change should be backported to 4.x to benefit customers upgrading to that version. A corresponding OCPBUGS ticket will be created to track the backport. Please let me know if I should do it and if there is any guidance about that.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdminonne
Copy link
Contributor

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go (1)

102-119: Consider extracting duplicate enableMetricsAccess check.

The enableMetricsAccess variable is computed identically on lines 33 and 103. Consider computing it once at the function start and reusing it to reduce duplication.

The conditional metrics URL configuration looks correct, routing to the appropriate monitoring endpoint (RHOBS Prometheus vs CoreOS Thanos) with the correct protocols.

Apply this diff to eliminate the duplicate check:

 func (cvo *clusterVersionOperator) adaptDeployment(cpContext component.WorkloadContext, deployment *appsv1.Deployment) error {
+	// Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set
+	enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess
+
-	// Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set
-	enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess
-
 	if enableMetricsAccess {
 		if deployment.Spec.Template.Labels == nil {
 			deployment.Spec.Template.Labels = map[string]string{}
 		}
 		deployment.Spec.Template.Labels[config.NeedMetricsServerAccessLabel] = "true"
 		deployment.Spec.Template.Spec.ServiceAccountName = ComponentName
 	}
 
 	featureSet := configv1.Default
 	if cpContext.HCP.Spec.Configuration != nil && cpContext.HCP.Spec.Configuration.FeatureGate != nil {
 		featureSet = cpContext.HCP.Spec.Configuration.FeatureGate.FeatureSet
 	}
 
 	// ... (rest of the function)
 
 	util.UpdateContainer(ComponentName, deployment.Spec.Template.Spec.Containers, func(c *corev1.Container) {
 		util.UpsertEnvVar(c, corev1.EnvVar{
 			Name:  "RELEASE_IMAGE",
 			Value: dataPlaneReleaseImage,
 		})
 
 		if updateService := cpContext.HCP.Spec.UpdateService; updateService != "" {
 			c.Args = append(c.Args, "--update-service", string(updateService))
 		}
 
-		// Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set
-		enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess
-
 		if enableMetricsAccess {
 			c.Args = append(c.Args, "--use-dns-for-services=true")
 
 			// Configure metrics endpoint based on monitoring stack
 			var metricsURL string
 			if os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" {
 				// RHOBS Prometheus uses HTTP without TLS
 				metricsURL = fmt.Sprintf("http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090?namespace=%s", cpContext.HCP.Namespace)
 			} else {
 				// CoreOS Thanos uses HTTPS with service CA
 				c.Args = append(c.Args, "--metrics-ca-bundle-file=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt")
 				metricsURL = fmt.Sprintf("https://thanos-querier.openshift-monitoring.svc:9092?namespace=%s", cpContext.HCP.Namespace)
 			}
 			c.Args = append(c.Args, fmt.Sprintf("--metrics-url=%s", metricsURL))
 		}
 	})
 
 	return nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 400c97a and 221e4a2.

📒 Files selected for processing (3)
  • cmd/install/install.go (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go (4 hunks)
  • hypershift-operator/controllers/hostedcluster/network_policies.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • cmd/install/install.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
🔇 Additional comments (4)
cmd/install/install.go (1)

272-272: LGTM: Documentation clarifies RHOBS monitoring behavior.

The updated flag description accurately reflects the additional CVO metrics access capability enabled when --rhobs-monitoring is set.

control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go (1)

32-41: LGTM: Metrics access properly gated.

The logic correctly enables metrics access when either RHOBS monitoring is active (via environment variable) or the explicit flag is set. The label and service account configuration support the network policy egress rules.

hypershift-operator/controllers/hostedcluster/network_policies.go (2)

88-97: LGTM: Metrics access network policy properly gated.

The logic correctly creates the metrics server network policy when either RHOBS monitoring is active or the explicit flag is set, consistent with the CVO deployment configuration.


836-910: LGTM: Network policy properly routes to monitoring stack.

The refactored function correctly creates egress rules based on the monitoring stack:

  • RHOBS: targets OBO Prometheus (port 9090)
  • CoreOS: targets Thanos Querier (port 9092)

This aligns with the CVO deployment changes that configure the corresponding metrics URLs. The pod and namespace selectors appropriately identify the monitoring components.

@Chee-Lu Chee-Lu changed the title OCM-10395: feat(monitoring): enable CVO metrics access with RHOBS monitoring flag OCPBUGS-69447: feat(updates): enable CVO metrics access with RHOBS monitoring flag Dec 16, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 16, 2025
@openshift-ci-robot
Copy link

@Chee-Lu: This pull request references Jira Issue OCPBUGS-69447, which is invalid:

  • expected the bug to target only the "4.22.0" version, but multiple target versions were set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does:

When --rhobs-monitoring=true is set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.

The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:

  • RHOBS stack (ROSA HCP): http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090
  • CoreOS stack (Self-managed HyperShift on OpenShift): https://thanos-querier.openshift-monitoring.svc:9092

Key changes:

  • CVO deployment enables metrics access when either --rhobs-monitoring (for ROSA HCP) or --enable-cvo-management-cluster-metrics-access (for self-managed HyperShift on OpenShift) is set
  • Network policies updated to allow egress to the appropriate monitoring endpoint based on stack configuration
  • Flag description updated to document automatic CVO metrics access behavior
  • Flags remain mutually exclusive to prevent misconfiguration

Which issue(s) this PR fixes:

fixes https://issues.redhat.com//browse/OCM-10395
fixes https://issues.redhat.com//browse/OCM-20970

Special notes for your reviewer:

Backport Requirements

This change should be backported to 4.x to benefit customers upgrading to that version. A corresponding OCPBUGS ticket will be created to track the backport. Please let me know if I should do it and if there is any guidance about that.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Chee-Lu
Copy link
Author

Chee-Lu commented Dec 17, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 17, 2025
@openshift-ci-robot
Copy link

@Chee-Lu: This pull request references Jira Issue OCPBUGS-69447, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

var metricsURL string
if os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" {
// RHOBS Prometheus uses HTTP without TLS
metricsURL = fmt.Sprintf("http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090?namespace=%s", cpContext.HCP.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that in ROSA land, the monitoring stack honors the namespace parameter (in fact I'd bet that it doesn't). Also hardcoding the URL might not work for other environments (e.g. GCP/ARO).

Copy link
Author

@Chee-Lu Chee-Lu Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

Also hardcoding the URL might not work for other environments (e.g. GCP/ARO).

ARO and GCP have separate tickets for their implementations(Ticket for ARO).

How should I scope this to ROSA only while leaving room for ARO/GCP teams to add their platform-specific configurations later? Or do you have other suggestions on how to handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I scope this to ROSA only while leaving room for ARO/GCP teams to add their platform-specific configurations later? Or do you have other suggestions on how to handle this?

I'm not an HyperShift engineer so I can't really tell what's the best option. But anyway I've got bigger concerns about the design itself as I'm not sure that it will work as expected...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And about the URL, my gut feeling is that it isn't an HyperShift concern but rather something that would be provided via configuration (e.g what if ROSA decides to change the service name?).

var metricsURL string
if os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" {
// RHOBS Prometheus uses HTTP without TLS
metricsURL = fmt.Sprintf("http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090?namespace=%s", cpContext.HCP.Namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can determine this using a check like the following:

func isROSAHCP(hc *hyperv1.HostedCluster) bool {
	if hc.Spec.Platform.AWS == nil {
		return false
	}

	for _, tag := range hc.Spec.Platform.AWS.ResourceTags {
		if tag.Key == "red-hat-managed" && tag.Value == "true" {
			return true
		}
	}
	return false
}

Copy link
Author

@Chee-Lu Chee-Lu Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added the checking to CVO and network policies

Chee-Lu and others added 2 commits December 30, 2025 16:20
Remove redundant variable declaration on line 103 that duplicated
the enableMetricsAccess variable already defined on line 33.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add ROSA HCP detection to ensure RHOBS monitoring configuration only
applies to ROSA HCP clusters. Non-ROSA clusters (ARO, GCP, self-managed)
will continue using the default CoreOS Thanos metrics stack.

Changes:
- Add isROSAHCP() function to detect ROSA HCP clusters via red-hat-managed tag
- Update CVO deployment to use RHOBS Prometheus only for ROSA HCP
- Update metrics server network policy to scope RHOBS egress rules to ROSA HCP
- Clarify documentation to distinguish ROSA HCP vs self-managed HyperShift

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@Chee-Lu Chee-Lu force-pushed the simplify-rhobs-cvo-metrics-access branch from 17190ba to c7c5b9c Compare January 5, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants